

Paper 2 - Research Methods
Paper 2 research methods, as/a level revision notes (aqa).
- BACK TO STUDENT RESOURCES
Research Methods Resources
Quantitative and qualitative data: the distinction between qualitative and quantitative data collection techniques.
Paper 2 - Psychology in Context | Research Methods | 30 Minutes
Variables In Psychological Research
Experimental methods in psychology, aims and hypotheses, directional and non-directional, sampling methods, experimental designs, ethics; including the role of the british psychological society’s code of ethics, correlational analysis: positive, negative and zero correlations, naturalistic observation, questionnaire construction; including the use of open and closed questions, design of interviews, case studies, pilot study, reliability, features of science, objectivity and the empirical method, psychological report writing, primary and secondary data, including meta-analysis, descriptive statistics: measures of central tendency (mean, median and mode), presentation and display of quantitative data: graphs, tables, scatter grams and bar charts, measures of dispersion (range and standard deviation), calculating percentages, quantitative data analysis - normal and skewed distributions, sign test – inferential statistics, statistical (inferential) testing, qualitative data analysis, the role of peer review in the scientific process.
- Psychopathology
- Social Psychology
- Approaches To Human Behaviour
- Biopsychology
- Research Methods
- Issues & Debates
- Teacher Hub
- Terms and Conditions
- Privacy Policy
- Cookie Policy
- [email protected]
- www.psychologyhub.co.uk
We're not around right now. But you can send us an email and we'll get back to you, asap.
Start typing and press Enter to search
Cookie Policy - Terms and Conditions - Privacy Policy
Grade Booster exam workshops for 2024 . Join us in to Birmingham, Bristol, Leeds, London, Manchester and Newcastle Book now →
Reference Library
Collections
- See what's new
- All Resources
- Student Resources
- Assessment Resources
- Teaching Resources
- CPD Courses
- Livestreams
Study notes, videos, interactive activities and more!
Psychology news, insights and enrichment
Currated collections of free resources
Browse resources by topic
- All Psychology Resources
Resource Selections
Currated lists of resources
Research Methods
Research Methods are the different tools/methods psychologists used to conduct psychological research, analyse data and draw conclusions.
- Share on Facebook
- Share on Twitter
- Share by Email
2023 Christmas Quizzes | AQA A Level Psychology
Teaching Activities
Halloween Quiz 2023 | AQA A Level Psychology
Data skills shorts: teaching activity for aqa gcse psychology.
Practice Exam Questions
Weekly Revision Planner for OCR GCSE Psychology
Exam Support
Research Methods Teaching Activity: Mix a Method
Ib psychology teaching activity: test research methods understanding, visualise this: using technology to model research method answers.
27th January 2017
Guess The Correlation Game
16th February 2017
Amusing Graphs ‘Prove’ That Correlation Does Not Imply Causation
20th February 2017
Q&A from AQA: Calculating Statistical Tests
24th February 2017
Q&A from AQA: The Sign Test & Significance
Q&a from aqa: standard deviation, q&a from aqa: levels of measurement, q&a from aqa: parametric vs. non-parametric tests, four new summer psychology cpd events.
4th April 2017
Example Answers for Research Methods: A Level Psychology, Paper 2, June 2019 (AQA)
Research methods: mcq revision test 2 for aqa a level psychology.
Quizzes & Activities
Research Methods: MCQ Revision Test 1 for AQA A Level Psychology
Topic Videos
Example Answer for Question 19 Paper 2: AS Psychology, June 2017 (AQA)
Example answers for research methods: a level psychology, paper 2, june 2018 (aqa), example answer for question 21 paper 2: a level psychology, june 2017 (aqa), example answer for question 20 paper 2: a level psychology, june 2017 (aqa), example answer for question 19 paper 2: a level psychology, june 2017 (aqa), example answer for question 18 paper 2: a level psychology, june 2017 (aqa), example answer for question 17 paper 2: a level psychology, june 2017 (aqa), example answer for question 16 paper 2: a level psychology, june 2017 (aqa), example answer for question 15 paper 2: a level psychology, june 2017 (aqa), example answer for question 14 paper 2: a level psychology, june 2017 (aqa), example answer for question 13 paper 2: a level psychology, june 2017 (aqa), example answer for question 12 paper 2: a level psychology, june 2017 (aqa), example answer for question 11 paper 2: a level psychology, june 2017 (aqa), example answer for question 10 paper 2: a level psychology, june 2017 (aqa), essays matter. full stop. five reasons to order the aqa a level psychology topic essays..
4th September 2017
Psychology Revision Webinars Are Now Available On-Demand
9th June 2017
Three Essential Paper 2 Research Method Videos for A Level Psychology Students
Design a study: example answer video for a level sam 2, paper 2, q21 (9 marks), research methods: as webinar video 2016, case studies: example answer video for a level sam 3, paper 1, q4 (5 marks), research methods: as exam 2016 feedback video, example answer for question 22 paper 2: as psychology, june 2017 (aqa), example answer for question 21 paper 2: as psychology, june 2017 (aqa), example answer for question 20 paper 2: as psychology, june 2017 (aqa), example answer for question 18 paper 2: as psychology, june 2017 (aqa), example answer for question 17 paper 2: as psychology, june 2017 (aqa), example answer for question 16 paper 2: as psychology, june 2017 (aqa), example answer for question 15 paper 2: as psychology, june 2017 (aqa), example answer for question 14 paper 2: as psychology, june 2017 (aqa), model answer for question 11 paper 2: as psychology, june 2016 (aqa), model answer for question 3 paper 1: as psychology, june 2016 (aqa), a level psychology topic quiz - research methods, our subjects.
- › Criminology
- › Economics
- › Geography
- › Health & Social Care
- › Psychology
- › Sociology
- › Teaching & learning resources
- › Student revision workshops
- › Online student courses
- › CPD for teachers
- › Livestreams
- › Teaching jobs
Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885
- › Contact us
- › Terms of use
- › Privacy & cookies
© 2002-2023 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

Overview – Research Methods
Research methods are how psychologists and scientists come up with and test their theories. The A level psychology syllabus covers several different types of studies and experiments used in psychology as well as how these studies are conducted and reported:
- Types of psychological studies (including experiments , observations , self-reporting , and case studies )
- Scientific processes (including the features of a study , how findings are reported , and the features of science in general )
- Data handling and analysis (including descriptive statistics and different ways of presenting data ) and inferential testing
Note: Unlike all other sections across the 3 exam papers, research methods is worth 48 marks instead of 24. Not only that, the other sections often include a few research methods questions, so this topic is the most important on the syllabus!

Example question: Design a matched pairs experiment the researchers could conduct to investigate differences in toy preferences between boys and girls. [12 marks]
Types of study
There are several different ways a psychologist can research the mind, including:
- Experiments
- Observation
- Self-reporting
Case studies
Each of these methods has its strengths and weaknesses. Different methods may be better suited to different research studies.
Experimental method
The experimental method looks at how variables affect outcomes. A variable is anything that changes between two situations ( see below for the different types of variables ). For example, Bandura’s Bobo the doll experiment looked at how changing the variable of the role model’s behaviour affected how the child played.
Experimental designs
Experiments can be designed in different ways, such as:
- Independent groups: Participants are divided into two groups. One group does the experiment with variable 1, the other group does the experiment with variable 2. Results are compared.
- Repeated measures: Participants are not divided into groups. Instead, all participants do the experiment with variable 1, then afterwards the same participants do the experiment with variable 2. Results are compared.
A matched pairs design is another form of independent groups design. Participants are selected. Then, the researchers recruit another group of participants one-by-one to match the characteristics of each member of the original group. This provides two groups that are relevantly similar and controls for differences between groups that might skew results. The experiment is then conducted as a normal independent groups design.
Types of experiment
Laboratory vs. field experiment.
Experiments are carried out in two different types of settings:
- E.g. Bandura’s Bobo the doll experiment or Asch’s conformity experiments
- E.g. Bickman’s study of the effects of uniforms on obedience
Strengths of laboratory experiment over field experiment:
The controlled environment of a laboratory experiment minimises the risk of other variables outside the researchers’ control skewing the results of the trial, making it more clear what (if any) the causal effects of a variable are. Because the environment is tightly controlled, any changes in outcome must be a result of a change in the variable.
Weaknesses of laboratory experiment over field experiment:
However, the controlled nature of a laboratory experiment might reduce its ecological validity . Results obtained in an artificial environment might not translate to real-life. Further, participants may be influenced by demand characteristics : They know they are taking part in a test, and so behave how they think they’re expected to behave rather than how they would naturally behave.
Natural and quasi experiment
Natural experiments are where variables vary naturally. In other words, the researcher can’t or doesn’t manipulate the variables . There are two types of natural experiment:
- E.g. studying the effect a change in drug laws (variable) has on addiction
- E.g. studying differences between men (variable) and women (variable)
Observational method
The observational method looks at and examines behaviour. For example, Zimbardo’s prison study observed how participants behaved when given certain social roles.
Observational design
Behavioural categories.
An observational study will use behavioural categories to prioritise which behaviours are recorded and ensure the different observers are consistent in what they are looking for.
For example, a study of the effects of age and sex on stranger anxiety in infants might use the following behavioural categories to organise observational data:
Rather than writing complete descriptions of behaviours, the behaviours can be coded into categories. For example, IS = interacted with stranger, and AS = avoided stranger. Researchers can also create numerical ratings to categorise behaviour, like the anxiety rating example above.
Inter-observer reliability : In order for observations to produce reliable findings, it is important that observers all code behaviour in the same way. For example, researchers would have to make it very clear to the observers what the difference between a ‘3’ on the anxiety scale above would be compared to a ‘7’. This inter-observer reliability avoids subjective interpretations of the different observers skewing the findings.
Event and time sampling
Because behaviour is constant and varied, it may not be possible to record every single behaviour during the observation period. So, in addition to categorising behaviour , study designers will also decide when to record a behaviour:
- Event sampling: Counting how many times the participant behaves in a certain way.
- Time sampling: Recording participant behaviour at regular time intervals. For example, making notes of the participant’s behaviour after every 1 minute has passed.
Note: Don’t get event and time sampling confused with participant sampling , which is how researchers select participants to study from a population.
Types of observation
Naturalistic vs. controlled.
Observations can be made in either a naturalistic or a controlled setting:
- E.g. setting up cameras in an office or school to observe how people interact in those environments
- E.g. Ainsworth’s strange situation or Zimbardo’s prison study
Covert vs. overt
Observations can be either covert or overt :
- E.g. setting up hidden cameras in an office
- E.g. Zimbardo’s prison study
Participant vs. non-participant
In observational studies, the researcher/observer may or may not participate in the situation being observed:
- E.g. in Zimbardo’s prison study , Zimbardo played the role of prison superintendent himself
- E.g. in Bandura’s Bobo the doll experiment and Ainsworth’s strange situation , the observers did not interact with the children being observed
Self-report method
Self-report methods get participants to provide information about themselves. Information can be obtained via questionnaires or interviews .
Types of self-report
Questionnaires.
A questionnaire is a standardised list of questions that all participants in a study answer. For example, Hazan and Shaver used questionnaires to collate self-reported data from participants in order to identify correlations between attachment as infants and romantic attachment as adults.
Questions in a questionnaire can be either open or closed :
- >8 hours
- E.g. “How did you feel when you thought you were administering a lethal shock?” or “What do you look for in a romantic partner and why?”
Strengths of questionnaires:
- Quantifiable: Closed questions provide quantifiable data in a consistent format, which enables to statistically analyse information in an objective way.
- Replicability: Because questionnaires are standardised (i.e. pre-set, all participants answer the same questions), studies involving them can be easily replicated . This means the results can be confirmed by other researchers, strengthening certainty in the findings.
Weaknesses of questionnaires:
- Biased samples: Questionnaires handed out to people at random will select for participants who actually have the time and are willing to complete the questionnaire. As such, the responses may be biased towards those of people who e.g. have a lot of spare time.
- Dishonest answers: Participants may lie in their responses – particularly if the true answer is something they are embarrassed or ashamed of (e.g. on controversial topics or taboo topics like sex)
- Misunderstanding/differences in interpretation: Different participants may interpret the same question differently. For example, the “are you religious?” example above could be interpreted by one person to mean they go to church every Sunday and pray daily, whereas another person may interpret religious to mean a vague belief in the supernatural.
- Less detail: Interviews may be better suited for detailed information – especially on sensitive topics – than questionnaires. For example, participants are unlikely to write detailed descriptions of private experiences in a questionnaire handed to them on the street.
In an interview , participants are asked questions in person. For example, Bowlby interviewed 44 children when studying the effects of maternal deprivation.
Interviews can be either structured or unstructured :
- Structured interview: Questions are standardised and pre-set. The interviewer asks all participants the same questions in the same order.
- Unstructured interview: The interviewer discusses a topic with the participant in a less structured and more spontaneous way, pursuing avenues of discussion as they come up.
Interviews can also be a cross between the two – these are called semi-structured interviews .
Strengths of interviews:
- More detail: Interviews – particularly unstructured interviews conducted by a skilled interviewer – enable researchers to delve deeper into topics of interest, for example by asking follow-up questions. Further, the personal touch of an interviewer may make participants more open to discussing personal or sensitive issues.
- Replicability: Structured interviews are easily replicated because participants are all asked the same pre-set list of questions. This replicability means the results can be confirmed by other researchers, strengthening certainty in the findings.
Weaknesses of interviews:
- Lack of quantifiable data: Although unstructured interviews enable researchers to delve deeper into interesting topics, this lack of structure may produce difficulties in comparing data between participants. For example, one interview may go down one avenue of discussion and another interview down a different avenue. This qualitative data may make objective or statistical analysis difficult.
- Interviewer effects : The interviewer’s appearance or character may bias the participant’s answers. For example, a female participant may be less comfortable answering questions on sex asked by a male interviewer and and thus give different answers than if she were asked by a female interviewer.
Note: This topic is A level only, you don’t need to learn about case studies if you are taking the AS exam only.
Case studies are detailed investigations into an individual, a group of people, or an event. For example, the biopsychology page describes a case study of a young boy who had the left hemisphere of his brain removed and the effects this had on his language skills.
In a case study, researchers use many of the methods described above – observation , questionnaires , interviews – to gather data on a subject. However, because case studies are studies of a single subject, the data they provide is primarily qualitative rather than quantitative . This data is then used to build a case history of the subject. Researchers then interpret this case history to draw their conclusions.
Types of case study
Typical vs. unusual cases.
Most case studies focus on unusual individuals, groups, and events.
Longitudinal
Many case studies are longitudinal . This means they take place over an extended time period, with researchers checking in with the subject at various intervals. For example, the case study of the boy who had his left hemisphere removed collected data on the boy’s language skills at ages 2.5, 4, and 14 to see how he progressed.
Strengths of case studies:
- Provides detailed qualitative data: Rather than focusing on one or two aspects of behaviour at a single point in time (e.g. in an experiment ), case studies produce detailed qualitative data.
- Allows for investigation into issues that may be impractical or unethical to study otherwise. For example, it would be unethical to remove half a toddler’s brain just to experiment , but if such a procedure is medically necessary then researchers can use this opportunity to learn more about the brain.
Weaknesses of case studies:
- Lack of scientific rigour: Because case studies are often single examples that cannot be replicated , the results may not be valid when applied to the general population.
- Researcher bias: The small sample size of case studies also means researchers need to apply their own subjective interpretation when drawing conclusions from them. As such, these conclusions may be skewed by the researcher’s own bias and not be valid when applied more generally. This criticism is often directed at Freud’s psychoanalytic theory because it draws heavily on isolated case studies of individuals.
Scientific processes
This section looks at how science works more generally – in particular how scientific studies are organised and reported . It also covers ways of evaluating a scientific study.
Study features and design
Studies will usually have an aim . The aim of a study is a description of what the researchers are investigating and why . For example, “to investigate the effect of SSRIs on symptoms of depression” or “to understand the effect uniforms have on obedience to authority”.
Studies seek to test a hypothesis . The experimental/alternate hypothesis of a study is a testable prediction of what the researchers expect to happen.
- E.g. “That SSRIs will reduce symptoms of depression” or “subjects are more likely to comply when orders are issued by someone wearing a uniform”.
- E.g. “That SSRIs have no effect on symptoms on depression” or “subject conformity will be the same when orders are issued by someone wearing a uniform as when orders are issued by someone bot wearing a uniform”
Either the experimental/alternate hypothesis or the null hypothesis will be supported by the results of the experiment.
It’s often not possible or practical to conduct research on everyone your study is supposed to apply to. So, researchers use sampling to select participants for their study.
- E.g. all humans, all women, all men, all children, etc.
- E.g. 10,000 humans, 200 women from the USA, children at a certain school
For example, the target population (i.e. who the results apply to) of Asch’s conformity experiments is all humans – but Asch didn’t conduct the experiment on that many people! Instead, Asch recruited 123 males and generalised the findings from this sample to the rest of the population.
Researchers choose from different sampling techniques – each has strengths and weaknesses.
Sampling techniques
Random sampling.
The random sampling method involves selecting participants from a target population at random – such as by drawing names from a hat or using a computer program to select them. This method means each member of the population has an equal chance of being selected and thus is not subject to any bias.
Strengths of random sampling:
- Unbiased: Selecting participants by random chance reduces the likelihood that researcher bias will skew the results of the study.
- Representative: If participants are selected at random – particularly if the sample size is large – it is likely that the sample will be representative of the population as a whole. For example, if the ratio of men:women in a population is 50:50 and participants are selected at random, it is likely that the sample will also have a ratio of men to women that is 50:50.
Weaknesses of random sampling:
- Impractical: It’s often impractical/impossible to include all members of a target population for selection. For example, it wouldn’t be feasible for a study on women to include the name of every woman on the planet for selection. But even if this was done, the randomly selected women may not agree to take part in the study anyway.
Systematic sampling
The systematic sampling method involves selecting participants from a target population by selecting them at pre-set intervals. For example, selecting every 50th person from a list, or every 7th, or whatever the interval is.
Strengths of systematic sampling:
- Unbiased and representative: Like random sampling , selecting participants according to a numerical interval provides an objective means of selecting participants that prevents researcher bias being able to skew the sample. Further, because the sampling method is independent of any particular characteristic (besides the arbitrary characteristic of the participant’s order in the list) this sample is likely to be representative of the population as a whole.
Weaknesses of systematic sampling:
- Unexpected bias: Some characteristics could occur more or less frequently at certain intervals, making a sample that is selected based on that interval biased. For example, houses tend to be have even numbers on one side of a road and odd numbers on the other. If one side of the road is more expensive than the other and you select every 4th house, say, then you will only select even numbers from one side of the road – and this sample may not be representative of the road as a whole.
Stratified sampling
The stratified sampling method involves dividing the population into relevant groups for study, working out what percentage of the population is in each group, and then randomly sampling the population according to these percentages.
For example, let’s say 20% of the population is aged 0-18, and 50% of the population is aged 19-65, and 30% of the population is aged >65. A stratified sample of 100 participants would randomly select 20x 0-18 year olds, 50x 19-65 year olds, and 30x people over 65.
Strengths of stratified sampling:
- Representative: The stratification is deliberately designed to yield a sample that is representative of the population as a whole. You won’t get people with certain characteristics being over- or under-represented within the sample.
- Unbiased: Because participants within each group are selected randomly , researcher bias is unable to skew who is included in the study.
Weaknesses of stratified sampling:
- Requires knowledge of population breakdown: Researchers need to accurately gauge what percentage of the population falls into what group. If the researchers get these percentages wrong, the sample will be biased and some groups will be over- or under-represented.
Opportunity and volunteer sampling
The opportunity and volunteer sampling methods:
- E.g. Approaching people in the street and asking them to complete a questionnaire.
- E.g. Placing an advert online inviting people to complete a questionnaire.
Strengths of opportunity and volunteer sampling:
- Quick and easy: Approaching participants ( opportunity sampling) or inviting participants ( volunteer sampling) is quick and straightforward. You don’t have to spend time compiling details of the target population (like in e.g. random or systematic sampling ), nor do you have to spend time dividing participants according to relevant categories (like in stratified sampling ).
- May be the only option: With natural experiments – where a variable changes as a result of something outside the researchers’ control – opportunity sampling may be the only viable sampling method. For example, researchers couldn’t randomly sample 10 cities from all the cities in the world and change the drug laws in those cities to see the effects – they don’t have that kind of power. However, if a city is naturally changing its drug laws anyway, researchers could use opportunity sampling to study that city for research.
Weaknesses of opportunity and volunteer sampling:
- Unrepresentative: The pool of participants will likely be biased towards certain kinds of people. For example, if you conduct opportunity sampling on a weekday at 10am, this sample will likely exclude people who are at work. Similarly, volunteer sampling is likely to exclude people who are too busy to take part in the study.
Independent vs. dependent variables
If the study involves an experiment , the researchers will alter an independent variable to measure its effects on a dependent variable :
- E.g. In Bickman’s study of the effects of uniforms on obedience , the independent variable was the uniform of the person giving orders.
- E.g. In Bickman’s study of the effects of uniforms on obedience , the dependent variable was how many people followed the orders.
Extraneous and confounding variables
In addition to the variables actually being investigated ( independent and dependent ), there may be additional (unwanted) variables in the experiment. These additional variables are called extraneous variables .
Researchers must control for extraneous variables to prevent them from skewing the results and leading to false conclusions. When extraneous variables are not properly controlled for they are known as confounding variables .
For example, if you’re studying the effect of caffeine on reaction times, it might make sense to conduct all experiments at the same time of day to prevent this extraneous variable from confounding the results. Reaction times change throughout the day and so if you test one group of subjects at 3pm and another group right before they go to bed, you may falsely conclude that the second group had slower reaction times.
Operationalisation of variables
Operationalisation of variables is where researchers clearly and measurably define the variables in their study.
For example, an experiment on the effects of sleep ( independent variable ) on anxiety ( dependent variable ) would need to clearly operationalise each variable. Sleep could be defined by number of hours spent in bed, but anxiety is a bit more abstract and so researchers would need to operationalise (i.e. define) anxiety such that it can be quantified in a measurable and objective way.
If variables are not properly operationalised, the experiment cannot be properly replicated , experimenters’ subjective interpretations may skew results, and the findings may not be valid .
Pilot studies
A pilot study is basically a practice run of the proposed research project. Researchers will use a small number of participants and run through the procedure with them. The purpose of this is to identify any problems or areas for improvement in the study design before conducting the research in full. A pilot study may also give an early indication of whether the results will be statistically significant .
For example, if a task is too easy for participants, or it’s too obvious what the real purpose of an experiment is, or questions in a questionnaire are ambiguous, then the results may not be valid . Conducting a pilot study first may save time and money as it enables researchers to identify and address such issues before conducting the full study on thousands of participants.
Study reporting
Features of a psychological report.
The report of a psychological study (research paper) typically contains the following sections in the following order:
- Title: A short and clear description of the research.
- Abstract: A summary of the research. This typically includes the aim and hypothesis , methods, results, and conclusion.
- Introduction: Funnel technique: Broad overview of the context (e.g. current theories, previous studies, etc.) before focusing in on this particular study, why it was conducted, its aims and hypothesis .
- Study design: This will explain what method was used (e.g. experiment or observation ), how the study was designed (e.g. independent groups or repeated measures ), and identification and operationalisation of variables .
- Participants: A description of the target population to be studied, the sampling method , how many participants were included.
- Equipment used: A description of any special equipment used in the study and how it was used.
- Standardised procedure: A detailed step-by-step description of how the study was conducted. This allows for the study to be replicated by other researchers.
- Controls : An explanation of how extraneous variables were controlled for so as to generate accurate results.
- Results: A presentation of the key findings from the data collected. This is typically written summaries of the raw data ( descriptive statistics ), which may also be presented in tables , charts, graphs , etc. The raw data itself is typically included in appendices.
- Discussion: An explanation of what the results mean and how they relate to the experimental hypothesis (supporting or contradicting it), any issues with how results were generated, how the results fit with other research, and suggestions for future research.
- Conclusion: A short summary of the key findings from the study.
- Book: Milgram, S., 2010. Obedience to Authority . 1st ed. Pinter & Martin.
- Journal article: Bandura, A., Ross, D. and Ross, S., 1961. Transmission of Aggression through Imitation of Aggressive Models . The Journal of Abnormal and Social Psychology, 63(3), pp.575-582.
- Appendices: This is where you put any supporting materials that are too detailed or long to include in the main report. For example, the raw data collected from a study, or the complete list of questions in a questionnaire .
Peer review
Peer review is a way of assessing the scientific credibility of a research paper before it is published in a scientific journal. The idea with peer review is to prevent false ideas and bad research from being accepted as fact.
It typically works as follows: The researchers submit their paper to the journal they want it to be published in, and the editor of that journal sends the paper to expert reviewers (i.e. psychologists who are experts in that area – the researchers’ ‘peers’) who evaluate the paper’s scientific validity. The reviewers may accept the paper as it is, accept it with a few changes, reject it and suggest revisions and resubmission at a later date, or reject it completely.
There are several different methods of peer review:
- Open review: The researchers and the reviewers are known to each other.
- Single-blind: The researchers do not know the names of the reviewers. This prevents the researchers from being able to influence the reviewer. This is the most common form of peer review.
- Double-blind: The researchers do not know the names of the reviewers, and the reviewers do not know the names of the researchers. This additionally prevents the reviewer’s bias towards the researcher from influencing their decision whether to accept their paper or not.
Criticisms of peer review:
- Bias: There are several ways peer review can be subject to bias. For example, academic research (particularly in niche areas) takes place among a fairly small circle of people who know each other and so these relationships may affect publication decisions. Further, many academics are funded by organisations and companies that may prefer certain ideas to be accepted as scientifically legitimate, and so this funding may produce conflicts of interest.
- Doesn’t always prevent fraudulent/bad research from being published: There are many examples of fraudulent research passing peer review and being published (see this Wikipedia page for examples).
- Prevents progress of new ideas: Reviewers of papers are typically older and established academics who have made their careers within the current scientific paradigm. As such, they may reject new or controversial ideas simply because they go against the current paradigm rather than because they are unscientific.
- Plagiarism: In single-blind and double-blind peer reviews, the reviewer may use their anonymity to reject or delay a paper’s publication and steal the good ideas for themself.
- Slow: Peer review can mean it takes months or even years between the researcher submitting a paper and its publication.
Study evaluation
In psychological studies, ethical issues are questions of what is morally right and wrong. An ethically-conducted study will protect the health and safety of the participants involved and uphold their dignity, privacy, and rights.
To provide guidance on this, the British Psychological Association has published a code of human research ethics :
- Participants are told the project’s aims , the data being collected, and any risks associated with participation.
- Participants have the right to withdraw or modify their consent at any time.
- Researchers can use incentives (e.g. money) to encourage participation, but these incentives can’t be so big that they would compromise a participant’s freedom of choice.
- Researchers must consider the participant’s ability to consent (e.g. age, mental ability, etc.)
- Prior (general) consent: Informing participants that they will be deceived without telling them the nature of the deception. However, this may affect their behaviour as they try to guess the real nature of the study.
- Retrospective consent: Informing participants that they were deceived after the study is completed and asking for their consent. The problem with this is that if they don’t consent then it’s too late.
- Presumptive consent: Asking people who aren’t participating in the study if they would be willing to participate in the study. If these people would be willing to give consent, then it may be reasonable to assume that those taking part in the study would also give consent.
- Confidentiality: Personal data obtained about participants should not be disclosed (unless the participant agreed to this in advance). Any data that is published will not be publicly identifiable as the participant’s.
- Debriefing: Once data gathering is complete, researchers must explain all relevant details of the study to participants – especially if deception was involved. If a study might have harmed the individual (e.g. its purpose was to induce a negative mood), it is ethical for the debrief to address this harm (e.g. by inducing a happy mood) so that the participant does not leave the study in a worse state than when they entered.
Reliability
Study results are reliable if the same results can be consistently replicated under the same circumstances. If results are inconsistent then the study is unreliable.
Note: Just because a study is reliable, its results are not automatically valid . A broken tape measure may reliably (i.e. consistently) record a person’s height as 200m, but that doesn’t mean this measurement is accurate.
There are several ways researchers can assess a study’s reliability:
Test-retest
Test-retest is when you give the same test to the same person on two different occasions. If the results are the same or similar both times, this suggests they are reliable.
For example, if your study used scales to measure participants’ weight, you would expect the scales to record the same (or a very similar) weight for the same person in the morning as in the evening. If the scales said the person weighed 100kg more later that same day, the scales (and therefore the results of the study) would be unreliable.
Inter-observer
Inter-observer reliability is a way to test the reliability of observational studies .
For example, if your study required observers to assess participants’ anxiety levels, you would expect different observers to grade the same behaviour in the same way. If one observer rated a participant’s behaviour a 3 for anxiety, and another observer rated the exact same behaviour an 8, the results would be unreliable.
Inter-observer reliability can be assessed mathematically by looking for correlation between observers’ scores. Inter-observer reliability can be improved by setting clearly defined behavioural categories .
Study results are valid if they accurately measure what they are supposed to. There are several ways researchers can assess a study’s validity:
- E.g. let’s say you come up with a new test to measure participants’ intelligence levels. If participants scoring highly on your test also scored highly on a standardised IQ test and vice versa, that would suggest your test has concurrent validity because participants’ scores are correlated with a known accurate test.
- E.g. a study that measures participants’ intelligence levels by asking them when their birthday is would not have face validity. Getting participants to complete a standardised IQ test would have greater face validity.
- E.g. let’s say your study was supposed to measure aggression levels in response to someone annoying. If the study was conducted in a lab and the participant knew they were taking part in a study, the results probably wouldn’t have much ecological validity because of the unrealistic environment.
- E.g. a study conducted in 1920 that measured participants’ attitudes towards social issues may have low temporal validity because societal attitudes have changed since then.
Control of extraneous variables
There are several different types of extraneous variables that can reduce the validity of a study. A well-conducted psychological study will control for these extraneous variables so that they do not skew the results.
Demand characteristics
Demand characteristics are extraneous variables where the demands of a study make participants behave in ways they wouldn’t behave outside of the study. This reduces the study’s ecological validity .
For example, if a participant guesses the purpose of an experiment they are taking part in, they may try to please the researcher by behaving in the ‘right’ way rather than the way they would naturally. Alternatively, the participant might rebel against the study and deliberately try to sabotage it (e.g. by deliberately giving wrong answers).
In some study designs, researchers can control for demand characteristics using single- blind methods. For example, a drug trial could give half the participants the actual drug and the other half a placebo but not tell participants which treatment they received. This way, both groups will have equal demand characteristics and so any differences between them should be down to the drug itself.
Investigator effects
Investigator effects are another extraneous variable where the characteristics of the researcher affect the participant’s behaviour. Again, this reduces the study’s ecological validity .
Many characteristics – e.g. the researcher’s age, gender, accent, what they’re wearing – could potentially influence the participant’s responses. For example, in an interview about sex, females may feel less comfortable answering questions asked by a male interviewer and thus give different answers than if they were asked by a female. The researcher’s biases may also come across in their body language or tone of voice, affecting the participant’s responses.
In some study designs, researchers can control for demand characteristics using double- blind methods. In a double-blind drug trial, for example, neither the participants nor the researchers know which participants get the actual drug and which get the placebo. This way, the researcher is unable to give any clues (consciously or unconsciously) to participants that would affect their behaviour.
Participant variables
Participant variables are differences between participants. These can be controlled for by random allocation .
For example, in an experiment on the effect of caffeine on reaction times, participants would be randomly allocated into either the caffeine group or the non-caffeine group. A non -random allocation method, such as allocating caffeine to men and placebo to women, could mean variables in the allocation method (in this case gender) skew the results. When participants are randomly allocated, any extraneous variables (e.g. gender in this case) will be allocated evenly between each group and so not skew the results of one group more than the other.
Situational variables
Situational variables are the environment the experiment is conducted in. These can be controlled for by standardisation .
For example, all the tests of caffeine on reaction times would be conducted in the same room, at the same time of day, using the same equipment, and so on to prevent these features of the environment from skewing the results.
In a repeated measures experiment, researchers may use counterbalancing to control for the order in which tasks are completed.
For example, half of participants would do task A followed by task B, and the other half would do task B followed by task A.
Implications of psychological research for the economy
Psychological research often has practical applications in real life. The following are some examples of how psychological findings may affect the economy:
- Attachment : Bowlby’s maternal deprivation hypothesis suggests that periods of extended separation between mother and child before age 3 are harmful to the child’s psychological development. And if mothers stay at home during this period, they can’t go out to work. However, some more recent research challenges Bowlby’s conclusions, suggesting that substitutes (e.g. the father , or nursery care) can care for the child, allowing the mother to go back to work sooner and remain economically active.
- Depression : Psychological research has found effective therapies for treating depression, such as cognitive behavioural therapy and SSRIs. The benefits of such therapies – if they are effective – are likely to outweigh the costs because they enable the person to return to work and pay taxes, as well avoiding long-term costs to the health service.
- OCD : Similar to above: Drug therapies (e.g. SSRIs) and behavioural approaches (e.g. CBT) may alleviate OCD symptoms, enabling OCD sufferers to return to work, pay taxes, and avoid reliance on healthcare services.
- Memory : Public money is required to fund police investigations. Psychological tools, such as the cognitive interview , have improved the accuracy of eyewitness testimonies, which equates to more efficient use of police time and resources.
Features of science
Theory construction and hypothesis testing.
Science works by making empirical observations of the world, formulating hypotheses /theories that explain these observations, and repeatedly testing these hypotheses /theories via experimentation.
- E.g. A tape measure provides a more objective measurement of something compared to a researcher’s guess. Similarly, a set of scales is a more objective way of determining which of two objects is heavier than a researcher lifting each up and giving their opinion.
- E.g. Burger (2009) replicated Milgram’s experiments with similar results.
- E.g. The hypothesis that “water boils at 100°c” could be falsified by an experiment where you heated water to 999°c and it didn’t boil. In contrast, “everything doubles in size every 10 seconds” could not be falsified by any experiment because whatever equipment you used to measure everything would also double in size.
- Freud’s psychodynamic theories are often criticised for being unfalsifiable: There’s not really any observations that could disprove them because every possible behaviour (e.g. crying or not crying) could be explained as the result of some unconscious thought process.
Paradigm shifts
Philosopher Thomas Kuhn argues that science is not as unbiased and objective as it seems. Instead, the majority of scientists just accept the existing scientific theories (i.e. the existing paradigm) as true and then find data that supports these theories while ignoring/rejecting data that refutes them.
Rarely, though, minority voices are able to successfully challenge the existing paradigm and replace it with a new one. When this happens it is a paradigm shift . An example of a paradigm shift in science is that from Newtonian gravity to Einstein’s theory of general relativity.
Data handling and analysis
Types of data, quantitative vs. qualitative.
Data from studies can be quantitative or qualitative :
- Quantitative: Numerical
- Qualitative: Non-numerical
For example, some quantitative data in the Milgram experiment would be how many subjects delivered a lethal shock. In contrast, some qualitative data would be asking the subjects afterwards how they felt about delivering the lethal shock.
Strengths of quantitative data / weaknesses of qualitative data:
- Can be compared mathematically and scientifically: Quantitative data enables researchers to mathematically and objectively analyse data. For example, mood ratings of 7 and 6 can be compared objectively, whereas qualitative assessments such as ‘sad’ and ‘unhappy’ are hard to compare scientifically.
Weaknesses of quantitative data / strengths of qualitative data:
- Less detailed: In reducing data to numbers and narrow definitions, quantitative data may miss important details and context.
Content analysis
Although the detail of qualitative data may be valuable, this level of detail can also make it hard to objectively or mathematically analyse. Content analysis is a way of analysing qualitative data. The process is as follows:
- E.g. A bunch of unstructured interviews on the topic of childhood
- E.g. Discussion of traumatic events, happy memories, births, and deaths
- E.g. Researchers listen to the unstructured interviews and count how often traumatic events are mentioned
- Statistical analysis is carried out on this data
Primary vs. secondary
Researchers can produce primary data or use secondary data to achieve the research aims of their study:
- Primary data: Original data collected for the study
- Secondary data: Data from another study previously conducted
Meta-analysis
A meta-analysis is a study of studies. It involves taking several smaller studies within a certain research area and using statistics to identify similarities and trends within those studies to create a larger study.
We have looked at some examples of meta-analyses elsewhere in the course such as Van Ijzendoorn’s meta-analysis of several strange situation studies and Grootheest et al’s meta-analysis of twin studies on OCD .
A good meta-analysis is often more reliable than a regular study because it is based on a larger data set, and any issues with one single study will be balanced out by the other studies.
Descriptive statistics
Measures of central tendency: mean, median, mode.
Mean , median , and mode are measures of central tendency . In other words, they are ways of reducing large data sets into averages .
The mean is calculated by adding all the numbers in a set together and dividing the total by the number of numbers.
- Example set: 22, 78, 3, 33, 90
- 22+78+3+33+90=226
- The mean is 45.2
- Uses all data in the set.
- Accurate: Provides a precise number based on all the data in a set.
Weaknesses:
- E.g.: 1, 3, 2, 5, 9, 4, 913 <- the mean is 133.9, but the 913 could be a measurement error or something and thus the mean is not representative of the data set
The median is calculated by arranging all the numbers in a set from smallest to biggest and then finding the number in the middle. Note: If the total number of numbers is odd, you just pick the middle one. But if the total number of numbers is even, you take the mid-point between the two numbers in the middle.
- Example set: 20, 66, 85, 45, 18, 13, 90, 28, 9
- 9, 13, 18, 20, 28 , 45, 66, 85, 90
- The median is 28
- Won’t be skewed by freak scores (unlike the mean).
- E.g.: 1, 1, 3 , 9865, 67914 <- 3 is not really representative of the larger numbers in the set.
- Less accurate/sensitive than the mean.
The mode is calculated by counting which is the most commonly occurring number in a set.
- Example set: 7, 7, 20 , 16, 1, 20 , 25, 16, 20 , 9
- There are two 7’s, but three 20’s
- The mode is 20
- Makes more sense for presenting the central tendency in data sets with whole numbers. For example, the average number of limbs for a human being will have a mean of something like 3.99, but a mode of 4.
- Does not use all the data in a set.
- A data set may have more than one mode.
Measures of dispersion: Range and standard deviation
Range and standard deviation are measures of dispersion . In other words, they quantify how much scores in a data set vary .
The range is calculated by subtracting the smallest number in the data set from the largest number.
- Example set: 59, 8, 7, 84, 9, 49, 14, 75, 88, 11
- The largest number is 88
- The smallest number is 7
- The range is 81
- Easy and quick to calculate: You just subtract one number from another
- Accounts for freak scores (highest and lowest)
- Can be skewed by freak scores: The difference between the biggest and smallest numbers can be skewed by a single anomalous result or error, which may give an exaggerated impression of the data distribution compared to standard deviation .
- 4, 4, 5, 5, 5, 6, 6, 7, 19
- 4, 16, 16, 17, 17, 17, 18, 19 19
Standard deviation
The standard deviation (σ) is a measure of how much numbers in a data set deviate from the mean (average). It is calculated as follows:
- Example data set: 59, 79, 43, 42, 81, 100, 38, 54, 92, 62
- Calculate the mean (65)
- -6, 14, -22, -23, 16, 35, -27, -11, 27, -3
- 36, 196, 484, 529, 256, 1225, 729, 121, 729, 9
- 36+196+484+529+256+1225+729+121+729+9=4314
- 4314/10=431.4
- √431.4=20.77
- The standard deviation is 20.77
Note: This method of standard deviation is based on the entire population. There is a slightly different method for calculating based on a sample where instead of dividing by the number of numbers in the second to last step, you divide by the number of numbers-1 (in this case 4314/9=479.333). This gives a standard deviation of 21.89.
- Is less skewed by freak scores: Standard deviation measures the average difference from the mean and so is less likely to be skewed by a single freak score (compared to the range ).
- Takes longer to calculate than the range .
Percentages
A percentage (%) describes how much out of 100 something occurs. It is calculated as follows:
- Example: 63 out of a total of 82 participants passed the test
- 63/82=0.768
- 0.768*100=76.8
- 76.8% of participants passed the test
Percentage change
To calculate a percentage change, work out the difference between the original number and the after number, divide that difference by the original number, then multiply the result by 100:
- Example: He got 80 marks on the test but after studying he got 88 marks on the test
- His test score increased by 10% after studying
Normal and skewed distributions
Normal distribution.
A data set that has a normal distribution will have the majority of scores on or near the mean average. A normal distribution is also symmetrical: There are an equal number of scores above the mean as below it. In a normal distribution, scores become rarer and rarer the more they deviate from the mean.
An example of a normal distribution is IQ scores. As you can see from the histogram below, there are as many IQ scores below the mean as there are above the mean :

When plotted on a histogram , data that follows a normal distribution will form a bell-shaped curve like the one above.
Skewed distribution

Skewed distributions are caused by outliers: Freak scores that throw off the mean . Skewed distributions can be positive or negative :
- Mean > Median > Mode
- Mean < Median < Mode
Correlation
Correlation refers to how closely related two (or more) things are related. For example, hot weather and ice cream sales may be positively correlated: When hot weather goes up, so do ice cream sales.
Correlations are measured mathematically using correlation coefficients (r). A correlation coefficient will be anywhere between +1 and -1:
- r=+1 means two things are perfectly positively correlated: When one goes up , so does the other by the same amount
- r=-1 means two things perfectly negatively correlated: When one goes up , the other goes down by the same amount
- r=0 means two things are not correlated at all: A change in one is totally independent of a change in the other
The following scattergrams illustrate various correlation coefficients:

Presentation of data

For example, the behavioural categories table above presents the raw data of each student in this made-up study. But in the results section, researchers might include another table that compares average anxiety rating scores for males and females.
Scattergrams

For example, each dot on the correlation scattergram opposite could represent a student. The x-axis could represent the number of hours the student studied, and the y-axis could represent the student’s test score.

For example, the results of Loftus and Palmer’s study into the effects of different leading questions on memory could be presented using the bar chart above. It’s not like there are categories in-between ‘contacted’ and ‘hit’, so the bars have gaps between them (unlike a histogram ).
A histogram is a bit like a bar chart but is used to illustrate continuous or interval data (rather than discrete data or whole numbers).

Because the data on the x axis is continuous, there are no gaps between the bars.

For example, the line graph above illustrates 3 different people’s progression in a strength training program over time.

For example, the frequency with which different attachment styles occurred in Ainsworth’s strange situation could be represented by the pie chart opposite.
Inferential testing
Probability and significance.
The point of inferential testing is to see whether a study’s results are statistically significant , i.e. whether any observed effects are as a result of whatever is being studied rather than just random chance.
For example, let’s say you are studying whether flipping a coin outdoors increases the likelihood of getting heads. You flip the coin 100 times and get 52 heads and 48 tails. Assuming a baseline expectation of 50:50, you might take these results to mean that flipping the coin outdoors does increase the likelihood of getting heads. However, from 100 coin flips, a ratio of 52:48 between heads and tails is not very significant and could have occurred due to luck. So, the probability that this difference in heads and tails is because you flipped the coin outside (rather than just luck) is low.
Probability is denoted by the symbol p . The lower the p value, the more statistically significant your results are. You can never get a p value of 0, though, so researchers will set a threshold at which point the results are considered statistically significant enough to reject the null hypothesis . In psychology, this threshold is usually <0.05, which means there is a less than 5% chance the observed effect is due to luck and a >95% chance it is a real effect.
Type 1 and type 2 errors
When interpreting statistical significance, there are two types of errors:
- E.g. The p threshold is <0.05, but the researchers’ results are among the 5% of fluke outcomes that look significant but are just due to luck
- E.g. The p threshold is set too low (e.g. <0.01), and the data falls short (e.g. p=<0.02)
Increasing the sample size reduces the likelihood of type 1 and type 2 errors.
Key maths skills made easy!

Types of statistical test
Note: The inferential tests below are needed for A level only, if you are taking the AS exam , you only need to know the sign test .
There are several different types of inferential test in addition to the sign test . Which inferential test is best for a study will depend on the following three criteria:
- Whether you are looking for a difference or a correlation
- E.g. at the competition there were 8 runners, 12 swimmers, and 6 long jumpers (it’s not like there are in-between measurements between ‘swimmer’ and ‘runner’)
- E.g. First, second, and third place in a race
- E.g. Ranking your mood on a scale of 1-10
- E.g. Weights in kg
- E.g. Heights in cm
- E.g. Times in seconds
- Whether the experimental design is related (i.e. repeated measures ) or unrelated (i.e. independent groups )
The following table shows which inferential test is appropriate according to these criteria:
Note: You won’t have to work out all these tests from scratch, but you may need to:
- Say which of the statistical tests is appropriate (i.e. based on whether it’s a difference or correlation; whether the data is nominal, ordinal, or interval; and whether the data is related or unrelated).
- Identify the critical value from a critical values table and use this to say whether a result (which will be given to you in the exam) is statistically significant.
The sign test
The sign test is a way to calculate the statistical significance of differences between related pairs (e.g. before and after in a repeated measures experiment ) of nominal data. If the observed value (s) is equal or less than the critical value (cv), the results are statistically significant.
Example: Let’s say we ran an experiment on 10 participants to see whether they prefer movie A or movie B .
- n = 9 (because even though there are 10 participants, one participant had no change so we exclude them from our calculation)
- In this case our experimental hypothesis is two-tailed: Participants may prefer movie A or movie B
- (The null hypothesis is that participants like both movies equally)
- In this case, let’s say it’s 0.1
- The experimental hypothesis is two-tailed
- So, in this example, our critical value (cv) is 1
- In this example, there are 2 As, so our observed value (s) is 2
- In this example, the observed value (2) is greater than the critical value (1) and so the results are not statistically significant. This means we must accept the null hypothesis and reject the experimental hypothesis .
<<<Biopsychology
Research Methods In Psychology
Saul Mcleod, PhD
Educator, Researcher
BSc (Hons) Psychology, MRes, PhD, University of Manchester
Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.
Learn about our Editorial Process
Olivia Guy-Evans, MSc
Associate Editor for Simply Psychology
BSc (Hons) Psychology, MSc Psychology of Education
Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.
Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.

Hypotheses are statements about the prediction of the results, that can be verified or disproved by some kind of investigation.
There are four types of hypotheses :
- Null Hypotheses (H0 ) – these predict that no difference will be found in the results between the conditions. Typically these are written ‘There will be no difference…’
- Alternative Hypotheses (Ha or H1) – these predict that there will be a significant difference in the results between the two conditions. This is also known as the experimental hypothesis.
- One-tailed (directional) hypotheses – these state the specific direction the researcher expects the results to move in, e.g. higher, lower, more, less. In a correlation study, the predicted direction of the correlation can be either positive or negative.
- Two-tailed (non-directional) hypotheses – these state that a difference will be found between the conditions of the independent variable but does not state the direction of a difference or relationship. Typically these are always written ‘There will be a difference ….’
All research has an alternative hypothesis (either a one-tailed or two-tailed) and a corresponding null hypothesis.
Once the research is conducted and results are found, psychologists must accept one hypothesis and reject the other.
So if a difference is found, the Psychologist would accept the alternative hypothesis and reject the null. The opposite applies if no difference is found.
Sampling techniques
Sampling is the process of selecting a representative group from the population under study.

A sample is the participants you select from a target population (the group you are interested in) to make generalisations about.
Representative means the extent to which a sample mirrors a researcher’s target population and reflects its characteristics.
Generalisability means the extent to which their findings can be applied to the larger population of which their sample was a part.
- Volunteer sample : where participants pick themselves through newspaper adverts, noticeboards or online.
- Opportunity sampling : also known as convenience sampling , uses people who are available at the time the study is carried out and willing to take part. It is based on convenience.
- Random sampling : when every person in the target population has an equal chance of being selected. An example of random sampling would be picking names out of a hat.
- Systematic sampling : when a system is used to select participants. Picking every Nth person from all possible participants. N = the number of people in the research population / the number of people needed for the sample.
- Stratified sampling : when you identify the subgroups and select participants in proportion to their occurrences.
- Snowball sampling : when researchers find a few participants, and then ask them to find participants themselves and so on.
- Quota sampling : when researchers will be told to ensure the sample fits with certain quotas, for example they might be told to find 90 participants, with 30 of them being unemployed.
Experiments always have an independent and dependent variable .
- The independent variable is the one the experimenter manipulates (the thing that changes between the conditions the participants are placed into). It is aassumed to have a direct effect on the dependent variable.
- The dependent variable is the thing being measured, or the results of the experiment.

Operationalization of variables means making them measurable/quantifiable. We must use operationalization to ensure that variables are in a form that can be easily tested.
For instance, we can’t really measure ‘happiness’ but we can measure how many times a person smiles within a two hour period.
By operationalizing variables, we make it easy for someone else to replicate our research. Remember, this is important because we can check if our findings are reliable.
Extraneous variables are all variables, which are not the independent variable, but could affect the results of the experiment.
It can be a natural characteristic of the participant, such as intelligence levels, gender, or age for example, or it could be a situational feature of the environment such as lighting or noise.
Demand characteristics are a type of extraneous variable that occurs if the participants work out the aims of the research study, they may begin to behave in a certain way.
For example, in Milgram’s research , critics argued that participants worked out that the shocks were not real and they administered them as they thought this was what was required of them.
Extraneous variables must be controlled so that they do not affect (confound) the results.
Randomly allocating participants to their conditions or using a matched pairs experimental design can help to reduce participant variables.
Situational variables are controlled by using standardized procedures, ensuring every participant in a given condition is treated in the same way
Experimental Design
Experimental design refers to how participants are allocated to each condition of the independent variable, such as a control or experimental group.
- Independent design ( between-groups design ): each participant is selected for only one group. With the independent design, the most common way of deciding which participants go into which group is by means of randomization.
- Matched participants design : each participant is selected for only one group, but the participants in the two groups are matched for some relevant factor or factors (e.g. ability; sex; age).
- Repeated measures design ( within groups) : each participant appears in both groups, so that there are exactly the same participants in each group.
- The main problem with the repeated measures design is that there may well be order effects. Their experiences during the experiment may change the participants in various ways.
- They may perform better when they appear in the second group because they have gained useful information about the experiment or about the task. On the other hand, they may perform less well on the second occasion because of tiredness or boredom.
- Counterbalancing is the best way of preventing order effects from disrupting the findings of an experiment, and involves ensuring that each condition is equally likely to be used first and second by the participants
If we wish to compare two groups with respect to a given independent variable, it is essential to make sure that the two groups do not differ in any other important way.
Experimental Methods
All experimental methods involve an iv (independent variable) and dv (dependent variable)..
- Lab Experiments are conducted in a well-controlled environment, not necessarily a laboratory, and therefore accurate and objective measurements are possible. The researcher decides where the experiment will take place, at what time, with which participants, in what circumstances, using a standardized procedure.
- Field experiments are conducted in the everyday (natural) environment of the participants. The experimenter still manipulates the IV, but in a real-life setting. It may be possible to control extraneous variables, though such control is more difficult than in a lab experiment.
- Natural experiments are when a naturally occurring IV is investigated that isn’t deliberately manipulated, it exists anyway. Participants are not randomly allocated, and the natural event may only occur rarely.
Case studies are in-depth investigations of a person, group, event, or community. It uses information from a range of sources, such as from the person concerned and also from their family and friends.
Many techniques may be used such as interviews, psychological tests, observations and experiments. Case studies are generally longitudinal: in other words, they follow the individual or group over an extended period of time.
Case studies are widely used in psychology and among the best-known ones carried out were by Sigmund Freud . He conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.
Case studies provide rich qualitative data and have high levels of ecological validity. However, it is difficult to generalize from individual cases as each one has unique characteristics.

Correlational Studies
Correlation means association; it is a measure of the extent to which two variables are related. One of the variables can be regarded as the predictor variable with the other one as the outcome variable.
Correlational studies typically involve obtaining two different measures from a group of participants, and then assessing the degree of association between the measures.
The predictor variable can be seen as occurring before the outcome variable in some sense. It is called the predictor variable, because it forms the basis for predicting the value of the outcome variable
Relationships between variables can be displayed on a graph or as a numerical score called a correlation coefficient.

- If an increase in one variable tends to be associated with an increase in the other, then this is known as a positive correlation .
- If an increase in one variable tends to be associated with a decrease in the other, then this is known as a negative correlation .
- A zero correlation occurs when there is no relationship between variables.
After looking at the scattergraph, if we want to be sure that a significant relationship does exist between the two variables, a statistical test of correlation can be conducted, such as Spearman’s rho.
The test will give us a score, called a correlation coefficient . This is a value between 0 and 1, and the closer to 1 the score is, the stronger the relationship between the variables. This value can be both positive e.g. 0.63, or negative -0.63.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.
Correlation does not always prove causation, as a third variable may be involved.

Interview Methods
Interviews are commonly divided into two types: structured and unstructured.
- Structured interviews are formal. The interview situation is standardized as far as possible. Structured interviews are formal, like job interviews. A fixed, predetermined set of questions is put to every participant in the same order and in the same way. Responses are recorded on a questionnaire, and the researcher presets the order and wording of questions, and sometimes the range of alternative answers. The interviewer stays within their role and maintains social distance from the interviewee.
- Unstructured interviews are informal, like casual conversations. A general conversation normally precedes them, and the researcher deliberately adopts an informal approach in an attempt to break down social barriers. There are no set questions, and the participant can raise whatever topics he/she feels are relevant and ask them in their own way. Questions are posed about participants’ answers to the subject Unstructured interviews are most useful in qualitative research to analyze attitudes and values. Though they rarely provide a valid basis for generalization, their main advantage is that they enable the researcher to probe social actors’ subjective point of view.
Questionnaire Method
Questionnaires can be thought of as a kind of written interview. They can be carried out face to face, by telephone, or post.
The choice of questions is important because of the need to avoid bias or ambiguity in the questions, ‘leading’ the respondent or causing offense.
- Open questions are designed to encourage a full, meaningful answer using the subject’s own knowledge and feelings. They provide insights into feelings, opinions, and understanding. Example: “How do you feel about that situation?”
- Closed questions can be answered with a simple “yes” or “no” or specific information, limiting the depth of response. They are useful for gathering specific facts or confirming details. Example: “Do you feel anxious in crowds?”
- Postal questionnaires seem to offer the opportunity of getting around the problem of interview bias by reducing the personal involvement of the researcher. Its other practical advantages are that it is cheaper than face-to-face interviews and can be used to contact many respondents scattered over a wide area relatively quickly.
Observations
There are different types of observation methods :
- Covert observation is where the researcher doesn’t tell the participants that they are being observed until after the study is complete. There could be ethical problems or deception and consent with this particular observation method.
- Overt observation is where a researcher tells the participants that they are being observed and what they are being observed for.
- Controlled : behavior is observed under controlled laboratory conditions (e.g., Bandura’s Bobo doll study).
- Natural : Here, spontaneous behavior is recorded in a natural setting.
- Participant : Here, the observer has direct contact with the group of people they are observing. The researcher becomes a member of the group they are researching.
- Non-participant (aka “fly on the wall): The researcher does not have direct contact with the people being observed. The observation of participants’ behavior is from a distance
Pilot Study
A pilot study is a small scale preliminary study conducted in order to evaluate the feasibility of the key s teps in a future, full-scale project.
A pilot study is an initial run-through of the procedures to be used in an investigation; it involves selecting a few people and trying out the study on them. It is possible to save time, and in some cases, money, by identifying any flaws in the procedures designed by the researcher.
A pilot study can help the researcher spot any ambiguities (i.e. unusual things) or confusion in the information given to participants or problems with the task devised.
Sometimes the task is too hard, and the researcher may get a floor effect, because none of the participants can score at all or can complete the task – all performances are low.
The opposite effect is a ceiling effect, when the task is so easy that all achieve virtually full marks or top performances and are “hitting the ceiling”.
Research Design
In cross-sectional research , a researcher compares multiple segments of the population at the same time
Sometimes we want to see how people change over time, as in studies of human development and lifespan. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time.
In cohort studies , the participants must share a common factor or characteristic such as age, demographic, or occupation. A cohort study is a type of longitudinal study in which researchers monitor and observe a chosen population over an extended period.
Triangulation means using more than one research method to improve the validity of the study.
Reliability
Reliability is a measure of consistency, if a particular measurement is repeated and the same result is obtained then it is described as being reliable.
- Test-retest reliability : assessing the same person on two different occasions which shows the extent to which the test produces the same answers.
- Inter-observer reliability : the extent to which there is an agreement between two or more observers.
Meta-Analysis
A meta-analysis is a systematic review that involves identifying an aim and then searching for research studies that have addressed similar aims/hypotheses.
This is done by looking through various databases and then decisions are made about what studies are to be included/excluded.
Strengths: Increases the conclusions’ validity as they’re based on a wider range.
Weaknesses: Research designs in studies can vary so they are not truly comparable.
Peer Review
A researcher submits an article to a journal. The choice of the journal may be determined by the journal’s audience or prestige.
The journal selects two or more appropriate experts (psychologists working in a similar field) to peer review the article without payment. The peer reviewers assess: the methods and designs used, originality of the findings, the validity of the original research findings and its content, structure and language.
Feedback from the reviewer determines whether the article is accepted. The article may be: Accepted as it is, accepted with revisions, sent back to the author to revise and re-submit or rejected without the possibility of submission.
The editor makes the final decision whether to accept or reject the research report based on the reviewers comments/ recommendations.
Peer review is important because it prevent faulty data from entering the public domain, it provides a way of checking the validity of findings and the quality of the methodology and is used to assess the research rating of university departments.
Peer reviews may be an ideal, whereas in practice there are lots of problems. For example, it slows publication down and may prevent unusual, new work being published. Some reviewers might use it as an opportunity to prevent competing researchers from publishing work.
Some people doubt whether peer review can really prevent the publication of fraudulent research.
The advent of the internet means that a lot of research and academic comment is being published without official peer reviews than before, though systems are evolving on the internet where everyone really has a chance to offer their opinions and police the quality of research.
Types of Data
- Quantitative data is numerical data e.g. reaction time or number of mistakes. It represents how much or how long, how many there are of something. A tally of behavioral categories and closed questions in a questionnaire collect quantitative data.
- Qualitative data is virtually any type of information that can be observed and recorded that is not numerical in nature and can be in the form of written or verbal communication. Open questions in questionnaires and accounts from observational studies collect qualitative data.
- Primary data is first-hand data collected for the purpose of the investigation.
- Secondary data is information that has been collected by someone other than the person who is conducting the research e.g. taken from journals, books or articles.
Validity means how well a piece of research actually measures what it sets out to, or how well it reflects the reality it claims to represent.
Validity is whether the observed effect is genuine and represents what is actually out there in the world.
- Concurrent validity is the extent to which a psychological measure relates to an existing similar measure and obtains close results. For example, a new intelligence test compared to an established test.
- Face validity : does the test measure what it’s supposed to measure ‘on the face of it’. This is done by ‘eyeballing’ the measuring or by passing it to an expert to check.
- Ecological validit y is the extent to which findings from a research study can be generalized to other settings / real life.
- Temporal validity is the extent to which findings from a research study can be generalized to other historical times.
Features of Science
- Paradigm – A set of shared assumptions and agreed methods within a scientific discipline.
- Paradigm shift – The result of the scientific revolution: a significant change in the dominant unifying theory within a scientific discipline.
- Objectivity – When all sources of personal bias are minimised so not to distort or influence the research process.
- Empirical method – Scientific approaches that are based on the gathering of evidence through direct observation and experience.
- Replicability – The extent to which scientific procedures and findings can be repeated by other researchers.
- Falsifiability – The principle that a theory cannot be considered scientific unless it admits the possibility of being proved untrue.
Statistical Testing
A significant result is one where there is a low probability that chance factors were responsible for any observed difference, correlation or association in the variables tested.
If our test is significant, we can reject our null hypothesis and accept our alternative hypothesis.
If our test is not significant, we can accept our null hypothesis and reject our alternative hypothesis. A null hypothesis is a statement of no effect.
In Psychology, we use p < 0.05 (as it strikes a balance between making a type I and II error) but p < 0.01 is used in tests that could cause harm like introducing a new drug.
A type I error is when the null hypothesis is rejected when it should have been accepted (happens when a lenient significance level is used, an error of optimism).
A type II error is when the null hypothesis is accepted when it should have been rejected (happens when a stringent significance level is used, an error of pessimism).
Ethical Issues
- Informed consent is when participants are able to make an informed judgment about whether to take part. It causes them to guess the aims of the study and change their behavior.
- To deal with it, we can gain presumptive consent or ask them to formally indicate their agreement to participate but it may invalidate the purpose of the study and it is not guaranteed that the participants would understand.
- Deception should only be used when it is approved by an ethics committee, as it involves deliberately misleading or withholding information. Participants should be fully debriefed after the study but debriefing can’t turn the clock back.
- All participants should be informed at the beginning that they have the right to withdraw if they ever feel distressed or uncomfortable.
- It causes bias as the ones that stayed are obedient and some may not withdraw as they may have been given incentives or feel like they’re spoiling the study. Researchers can offer the right to withdraw data after participation.
- Participants should all have protection from harm . The researcher should avoid risks greater than those experienced in everyday life and they should stop the study if any harm is suspected. However, the harm may not be apparent at the time of the study.
- Confidentiality concerns the communication of personal information. The researchers should not record any names but use numbers or false names though it may not be possible as it is sometimes possible to work out who the researchers were.

Thesis Helpers

Find the best tips and advice to improve your writing. Or, have a top expert write your paper.
Research Methods In Psychology Types And Guidelines

Due to the evolution of man over the years and with the rise of social issues, we had to come up with a way to study human behavior in response to these issues and consequently come up with solutions to thousands of problems. This led to the creation of a social science field known as psychology with different research methods in psychology being developed as time goes on.
Using the trial-and-error method, researchers developed various types of research methods in psychology that have helped gather useful information.
However, what are the research methods in psychology? We’ll be going into details on what psychology is and why this field of study is important. Then, you can decide the type of research methods in psychology that best fit your purpose as a student.
What Is Psychology, And Why Do People Study It?
These are all important in understanding complex human behavior. Data collected through these research methods can be demographic, physical, psychological or physiological.
However, it’s thanks to psychology that we can gain a better understanding of our relationship as humans with social problems in our immediate environment. The most prominent use of psychological research methods is in diagnosing mental health issues and approaches to solving them. There are also ethics guiding the various types of research methods in psychology.
What Are The Five Methods Of Research In Psychology?
There are 5 methods of research in psychology. Each of these methods are independent and can be used by students while writing their school papers. The methods are:
Surveys can be carried out through various channels, including paper, telephone, mail, in person or over the internet. In recent times, online surveys have become really popular due to their incredible flexibility. With online surveys, it’s easy to reach a broad audience and collect more samples for the research.
So, surveys can come in handy for documenting various social problems, including gender-based issues and poverty. However, just as surveys are low-cost and give the needed qualitative data, they also leave room for poor question design from those who are not so skilled in its usage.
The survey method of research can be classified into two based on methodology. These are quantitative and qualitative research.
Quantitative Research Method: This is used to collect samples on primarily numerical data. A better way to explain this is that the quantitative research method comes in handy for collecting statistical information towards getting valid results. If a researcher wants to find out how many people in an organization believe that mental health disorders exist, they would be able to collect valuable quantitative data using survey research. Several methods for this include systematic observation, face-to-face interviews and polls. Qualitative Research Method: As opposed to the quantitative method, the qualitative research method involves the collection of non-numerical data. Therefore, researchers use this to gather strictly open-ended questions. An excellent example of this psychology research method is using surveys to collect customer feedback on a product. This can be done through focus groups, observations or one-on-one interviews.
This is one of the most hands-on methods of psychological research as it has a high success rate in acquiring accurate data. However, this is described as a non-experimental research method as there are no controls that could interfere with results.
It depends mainly on the study taking place in a natural setting but could involve mixed techniques, including quantitative and qualitative methods. There are two major types of observational study: naturalistic observation and participant observation.
Naturalistic observation: The major feature of naturalistic observation as one of the prevalent psychology research methods is that it takes place in the natural environment of all participants. Therefore, psychologies could choose to observe how incarcerated people in maximum prisons respond to several stimuli and their effect on their mental health. However, naturalistic observation could be done overtly or covertly. When carried out overtly, participants are usually aware that they are being observed. As you can expect, this could negatively affect the collected data as humans tend to modify their behavior when under observation. Alternatively, naturalistic observation can be carried out covertly in contexts where there’s no expectation of privacy. Therefore, the data collected are as natural as possible as the participants have no need to work on being on their best behavior all the time. Participant observation: In this study method, the researcher is directly involved in the research process. While the goals of the observational study remain the same, the researcher actively engages with the individuals in the study setting. This could be through a disguised method where the psychology researcher pretends to be a member of a group to collect data and study the individuals.
The advantage of the case study form of psychology research is that it provides a detailed understanding of a subject matter. Therefore, in psychology, a better understanding of a patient’s mental illness can be instrumental in deciding on effective treatment methods.
The major feature of case study research is that psychologies can interview participants directly and also observe their behavior for a specific period of the research, as practiced by Sigmund Freud. This case study practitioner used case studies consistently to analyze patients, diagnose and consequently help them combat various psychological ailments.
It’s no wonder that Jean Piaget’s cognitive development theory and psychoanalytic theory of Sigmund Freud, both products of case study research, are popular today.
The case study psychology research method uses techniques like:
Unstructured interviews: While the individual goes about their daily life, researchers study their behavior and ask on-the-spot questions, which help to understand patterns that drive the participant’s decisions. Psychological tests: Researchers can carry out various tests pertaining to mental health or social issues affecting the participants. Examples are mental health testing and aptitude tests. This leads to the collection of direct data, increasing the accuracy of results from the research.
This is a common research approach with clinical psychologies as they carry out content analysis using hand-written letters, business mails, patient interviews, and so much more. Data collected through content analysis of these sources can help in the development of effective psychological treatment methods.
Forensic psychologists can also use content analysis to research severe mental health issues and behavioral problems.
The general steps in the content analysis are:
Collect the data Critically examine the research data to get familiar with concepts Develop a specific set of rules for selecting the smallest part of the concept for analysis. This is known as the coding unit Create the coding unit using the developed rules Analyze the findings Draw conclusions and determine the results of the research
This research method in psychology is further subdivided into the conceptual and relational analysis.
- Selecting a concept to research.
- Divide the concept into various categories.
- Identifying the relationship between the multiple categories related to the central concept.
- Selecting a concept to research. This could be a word, phrase or sentence.
- Divide the concept into various categories. This makes it so much easier to pay more attention to the data that would provide useful information for psychology research.
- Examine how this concept occurs in the available research data.
- Code and analyze final results.
Various types of psychological tests include educational testing, aptitude testing, personality assessment and mental health assessment. These tests can consist of multiple-choice questions that must be carefully designed with factors like age, gender and qualification in mind.
However, it’s crucial for all participants to be adequately informed on the testing procedure and provided with instructions guiding the psychological tests. Psychological tests are governed by three important factors, and these are outlined below.
- It must be valid: This is the most obvious criteria as your psychological test must be designed to measure the specific research data. A good example here is that a mental health assessment test should measure exactly that and not another parameter like physical health.
- It must be reliable: The data gotten from this test will drive the direction of your research and be used in various contexts. Therefore, the psychological test must be reliable with almost negligible differences in the repeated test scores.
- You must develop norms: These are the standard values that are a representation of the average performance of participants in a task. The importance of norms is that researchers use this to compare and interpret results from psychological tests. These could be percentile norms, age norms, grade norms or descriptive norms.
How To Carry Out Effective Psychology Research
These are some of the research methods psychology students engage in while in school, and they are a great boost and psychology dissertation help (if they are working on their theses). As you may have guessed by now, carrying out psychology research is no walk in the park. However, we have outlined the five major methods of research in psychology, and you can use any of these psychological research methods that applies to the objective of your research.
Apart from these, your teacher or professor can have specific requests on the type of psychology research method to use and guidelines on getting the best result as a student.
However, not everyone can carry out effective psychology research despite reading so much about it, and that’s where professional assistance comes in. Psychological research doesn’t have to be complicated when you have access to writing help to provide high-quality research results.
Get Help With Psychological Research
Before hiring someone to write your research, be sure to ask the right questions to ensure that you’re getting the best online help that will guarantee you top marks. Are you one of those asking the question, “who can help me write my thesis?” If yes, you have your answer. We can help you write your thesis and other academic papers.
We are experts who are knowledgeable in psychological research, and our team of expert writers will help you create the best research papers that will earn you top grades at an affordable fee. Our PhD thesis writing services are some of the best on the internet. As native English writers, we have experts educated on the various types of research methods psychology and will provide custom thesis help and editing services anytime you require this assistance.

Make PhD experience your own
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Ch 2: Psychological Research Methods

Have you ever wondered whether the violence you see on television affects your behavior? Are you more likely to behave aggressively in real life after watching people behave violently in dramatic situations on the screen? Or, could seeing fictional violence actually get aggression out of your system, causing you to be more peaceful? How are children influenced by the media they are exposed to? A psychologist interested in the relationship between behavior and exposure to violent images might ask these very questions.
The topic of violence in the media today is contentious. Since ancient times, humans have been concerned about the effects of new technologies on our behaviors and thinking processes. The Greek philosopher Socrates, for example, worried that writing—a new technology at that time—would diminish people’s ability to remember because they could rely on written records rather than committing information to memory. In our world of quickly changing technologies, questions about the effects of media continue to emerge. Is it okay to talk on a cell phone while driving? Are headphones good to use in a car? What impact does text messaging have on reaction time while driving? These are types of questions that psychologist David Strayer asks in his lab.
Watch this short video to see how Strayer utilizes the scientific method to reach important conclusions regarding technology and driving safety.
You can view the transcript for “Understanding driver distraction” here (opens in new window) .
How can we go about finding answers that are supported not by mere opinion, but by evidence that we can all agree on? The findings of psychological research can help us navigate issues like this.
Introduction to the Scientific Method
Learning objectives.
- Explain the steps of the scientific method
- Describe why the scientific method is important to psychology
- Summarize the processes of informed consent and debriefing
- Explain how research involving humans or animals is regulated

Scientists are engaged in explaining and understanding how the world around them works, and they are able to do so by coming up with theories that generate hypotheses that are testable and falsifiable. Theories that stand up to their tests are retained and refined, while those that do not are discarded or modified. In this way, research enables scientists to separate fact from simple opinion. Having good information generated from research aids in making wise decisions both in public policy and in our personal lives. In this section, you’ll see how psychologists use the scientific method to study and understand behavior.
The Scientific Process

The goal of all scientists is to better understand the world around them. Psychologists focus their attention on understanding behavior, as well as the cognitive (mental) and physiological (body) processes that underlie behavior. In contrast to other methods that people use to understand the behavior of others, such as intuition and personal experience, the hallmark of scientific research is that there is evidence to support a claim. Scientific knowledge is empirical : It is grounded in objective, tangible evidence that can be observed time and time again, regardless of who is observing.
While behavior is observable, the mind is not. If someone is crying, we can see the behavior. However, the reason for the behavior is more difficult to determine. Is the person crying due to being sad, in pain, or happy? Sometimes we can learn the reason for someone’s behavior by simply asking a question, like “Why are you crying?” However, there are situations in which an individual is either uncomfortable or unwilling to answer the question honestly, or is incapable of answering. For example, infants would not be able to explain why they are crying. In such circumstances, the psychologist must be creative in finding ways to better understand behavior. This module explores how scientific knowledge is generated, and how important that knowledge is in forming decisions in our personal lives and in the public domain.
Process of Scientific Research

Scientific knowledge is advanced through a process known as the scientific method. Basically, ideas (in the form of theories and hypotheses) are tested against the real world (in the form of empirical observations), and those empirical observations lead to more ideas that are tested against the real world, and so on.
The basic steps in the scientific method are:
- Observe a natural phenomenon and define a question about it
- Make a hypothesis, or potential solution to the question
- Test the hypothesis
- If the hypothesis is true, find more evidence or find counter-evidence
- If the hypothesis is false, create a new hypothesis or try again
- Draw conclusions and repeat–the scientific method is never-ending, and no result is ever considered perfect
In order to ask an important question that may improve our understanding of the world, a researcher must first observe natural phenomena. By making observations, a researcher can define a useful question. After finding a question to answer, the researcher can then make a prediction (a hypothesis) about what he or she thinks the answer will be. This prediction is usually a statement about the relationship between two or more variables. After making a hypothesis, the researcher will then design an experiment to test his or her hypothesis and evaluate the data gathered. These data will either support or refute the hypothesis. Based on the conclusions drawn from the data, the researcher will then find more evidence to support the hypothesis, look for counter-evidence to further strengthen the hypothesis, revise the hypothesis and create a new experiment, or continue to incorporate the information gathered to answer the research question.
Basic Principles of the Scientific Method
Two key concepts in the scientific approach are theory and hypothesis. A theory is a well-developed set of ideas that propose an explanation for observed phenomena that can be used to make predictions about future observations. A hypothesis is a testable prediction that is arrived at logically from a theory. It is often worded as an if-then statement (e.g., if I study all night, I will get a passing grade on the test). The hypothesis is extremely important because it bridges the gap between the realm of ideas and the real world. As specific hypotheses are tested, theories are modified and refined to reflect and incorporate the result of these tests.

Other key components in following the scientific method include verifiability, predictability, falsifiability, and fairness. Verifiability means that an experiment must be replicable by another researcher. To achieve verifiability, researchers must make sure to document their methods and clearly explain how their experiment is structured and why it produces certain results.
Predictability in a scientific theory implies that the theory should enable us to make predictions about future events. The precision of these predictions is a measure of the strength of the theory.
Falsifiability refers to whether a hypothesis can be disproved. For a hypothesis to be falsifiable, it must be logically possible to make an observation or do a physical experiment that would show that there is no support for the hypothesis. Even when a hypothesis cannot be shown to be false, that does not necessarily mean it is not valid. Future testing may disprove the hypothesis. This does not mean that a hypothesis has to be shown to be false, just that it can be tested.
To determine whether a hypothesis is supported or not supported, psychological researchers must conduct hypothesis testing using statistics. Hypothesis testing is a type of statistics that determines the probability of a hypothesis being true or false. If hypothesis testing reveals that results were “statistically significant,” this means that there was support for the hypothesis and that the researchers can be reasonably confident that their result was not due to random chance. If the results are not statistically significant, this means that the researchers’ hypothesis was not supported.
Fairness implies that all data must be considered when evaluating a hypothesis. A researcher cannot pick and choose what data to keep and what to discard or focus specifically on data that support or do not support a particular hypothesis. All data must be accounted for, even if they invalidate the hypothesis.
Applying the Scientific Method
To see how this process works, let’s consider a specific theory and a hypothesis that might be generated from that theory. As you’ll learn in a later module, the James-Lange theory of emotion asserts that emotional experience relies on the physiological arousal associated with the emotional state. If you walked out of your home and discovered a very aggressive snake waiting on your doorstep, your heart would begin to race and your stomach churn. According to the James-Lange theory, these physiological changes would result in your feeling of fear. A hypothesis that could be derived from this theory might be that a person who is unaware of the physiological arousal that the sight of the snake elicits will not feel fear.
Remember that a good scientific hypothesis is falsifiable, or capable of being shown to be incorrect. Recall from the introductory module that Sigmund Freud had lots of interesting ideas to explain various human behaviors (Figure 5). However, a major criticism of Freud’s theories is that many of his ideas are not falsifiable; for example, it is impossible to imagine empirical observations that would disprove the existence of the id, the ego, and the superego—the three elements of personality described in Freud’s theories. Despite this, Freud’s theories are widely taught in introductory psychology texts because of their historical significance for personality psychology and psychotherapy, and these remain the root of all modern forms of therapy.

In contrast, the James-Lange theory does generate falsifiable hypotheses, such as the one described above. Some individuals who suffer significant injuries to their spinal columns are unable to feel the bodily changes that often accompany emotional experiences. Therefore, we could test the hypothesis by determining how emotional experiences differ between individuals who have the ability to detect these changes in their physiological arousal and those who do not. In fact, this research has been conducted and while the emotional experiences of people deprived of an awareness of their physiological arousal may be less intense, they still experience emotion (Chwalisz, Diener, & Gallagher, 1988).
Link to Learning
Why the scientific method is important for psychology.
The use of the scientific method is one of the main features that separates modern psychology from earlier philosophical inquiries about the mind. Compared to chemistry, physics, and other “natural sciences,” psychology has long been considered one of the “social sciences” because of the subjective nature of the things it seeks to study. Many of the concepts that psychologists are interested in—such as aspects of the human mind, behavior, and emotions—are subjective and cannot be directly measured. Psychologists often rely instead on behavioral observations and self-reported data, which are considered by some to be illegitimate or lacking in methodological rigor. Applying the scientific method to psychology, therefore, helps to standardize the approach to understanding its very different types of information.
The scientific method allows psychological data to be replicated and confirmed in many instances, under different circumstances, and by a variety of researchers. Through replication of experiments, new generations of psychologists can reduce errors and broaden the applicability of theories. It also allows theories to be tested and validated instead of simply being conjectures that could never be verified or falsified. All of this allows psychologists to gain a stronger understanding of how the human mind works.
Scientific articles published in journals and psychology papers written in the style of the American Psychological Association (i.e., in “APA style”) are structured around the scientific method. These papers include an Introduction, which introduces the background information and outlines the hypotheses; a Methods section, which outlines the specifics of how the experiment was conducted to test the hypothesis; a Results section, which includes the statistics that tested the hypothesis and state whether it was supported or not supported, and a Discussion and Conclusion, which state the implications of finding support for, or no support for, the hypothesis. Writing articles and papers that adhere to the scientific method makes it easy for future researchers to repeat the study and attempt to replicate the results.
Ethics in Research
Today, scientists agree that good research is ethical in nature and is guided by a basic respect for human dignity and safety. However, as you will read in the Tuskegee Syphilis Study, this has not always been the case. Modern researchers must demonstrate that the research they perform is ethically sound. This section presents how ethical considerations affect the design and implementation of research conducted today.
Research Involving Human Participants
Any experiment involving the participation of human subjects is governed by extensive, strict guidelines designed to ensure that the experiment does not result in harm. Any research institution that receives federal support for research involving human participants must have access to an institutional review board (IRB) . The IRB is a committee of individuals often made up of members of the institution’s administration, scientists, and community members (Figure 6). The purpose of the IRB is to review proposals for research that involves human participants. The IRB reviews these proposals with the principles mentioned above in mind, and generally, approval from the IRB is required in order for the experiment to proceed.

An institution’s IRB requires several components in any experiment it approves. For one, each participant must sign an informed consent form before they can participate in the experiment. An informed consent form provides a written description of what participants can expect during the experiment, including potential risks and implications of the research. It also lets participants know that their involvement is completely voluntary and can be discontinued without penalty at any time. Furthermore, the informed consent guarantees that any data collected in the experiment will remain completely confidential. In cases where research participants are under the age of 18, the parents or legal guardians are required to sign the informed consent form.
While the informed consent form should be as honest as possible in describing exactly what participants will be doing, sometimes deception is necessary to prevent participants’ knowledge of the exact research question from affecting the results of the study. Deception involves purposely misleading experiment participants in order to maintain the integrity of the experiment, but not to the point where the deception could be considered harmful. For example, if we are interested in how our opinion of someone is affected by their attire, we might use deception in describing the experiment to prevent that knowledge from affecting participants’ responses. In cases where deception is involved, participants must receive a full debriefing upon conclusion of the study—complete, honest information about the purpose of the experiment, how the data collected will be used, the reasons why deception was necessary, and information about how to obtain additional information about the study.
Dig Deeper: Ethics and the Tuskegee Syphilis Study
Unfortunately, the ethical guidelines that exist for research today were not always applied in the past. In 1932, poor, rural, black, male sharecroppers from Tuskegee, Alabama, were recruited to participate in an experiment conducted by the U.S. Public Health Service, with the aim of studying syphilis in black men (Figure 7). In exchange for free medical care, meals, and burial insurance, 600 men agreed to participate in the study. A little more than half of the men tested positive for syphilis, and they served as the experimental group (given that the researchers could not randomly assign participants to groups, this represents a quasi-experiment). The remaining syphilis-free individuals served as the control group. However, those individuals that tested positive for syphilis were never informed that they had the disease.
While there was no treatment for syphilis when the study began, by 1947 penicillin was recognized as an effective treatment for the disease. Despite this, no penicillin was administered to the participants in this study, and the participants were not allowed to seek treatment at any other facilities if they continued in the study. Over the course of 40 years, many of the participants unknowingly spread syphilis to their wives (and subsequently their children born from their wives) and eventually died because they never received treatment for the disease. This study was discontinued in 1972 when the experiment was discovered by the national press (Tuskegee University, n.d.). The resulting outrage over the experiment led directly to the National Research Act of 1974 and the strict ethical guidelines for research on humans described in this chapter. Why is this study unethical? How were the men who participated and their families harmed as a function of this research?

Learn more about the Tuskegee Syphilis Study on the CDC website .
Research Involving Animal Subjects

This does not mean that animal researchers are immune to ethical concerns. Indeed, the humane and ethical treatment of animal research subjects is a critical aspect of this type of research. Researchers must design their experiments to minimize any pain or distress experienced by animals serving as research subjects.
Whereas IRBs review research proposals that involve human participants, animal experimental proposals are reviewed by an Institutional Animal Care and Use Committee (IACUC) . An IACUC consists of institutional administrators, scientists, veterinarians, and community members. This committee is charged with ensuring that all experimental proposals require the humane treatment of animal research subjects. It also conducts semi-annual inspections of all animal facilities to ensure that the research protocols are being followed. No animal research project can proceed without the committee’s approval.
Introduction to Approaches to Research
- Differentiate between descriptive, correlational, and experimental research
- Explain the strengths and weaknesses of case studies, naturalistic observation, and surveys
- Describe the strength and weaknesses of archival research
- Compare longitudinal and cross-sectional approaches to research
- Explain what a correlation coefficient tells us about the relationship between variables
- Describe why correlation does not mean causation
- Describe the experimental process, including ways to control for bias
- Identify and differentiate between independent and dependent variables

Psychologists use descriptive, experimental, and correlational methods to conduct research. Descriptive, or qualitative, methods include the case study, naturalistic observation, surveys, archival research, longitudinal research, and cross-sectional research.
Experiments are conducted in order to determine cause-and-effect relationships. In ideal experimental design, the only difference between the experimental and control groups is whether participants are exposed to the experimental manipulation. Each group goes through all phases of the experiment, but each group will experience a different level of the independent variable: the experimental group is exposed to the experimental manipulation, and the control group is not exposed to the experimental manipulation. The researcher then measures the changes that are produced in the dependent variable in each group. Once data is collected from both groups, it is analyzed statistically to determine if there are meaningful differences between the groups.
When scientists passively observe and measure phenomena it is called correlational research. Here, psychologists do not intervene and change behavior, as they do in experiments. In correlational research, they identify patterns of relationships, but usually cannot infer what causes what. Importantly, with correlational research, you can examine only two variables at a time, no more and no less.
Watch It: More on Research
If you enjoy learning through lectures and want an interesting and comprehensive summary of this section, then click on the Youtube link to watch a lecture given by MIT Professor John Gabrieli . Start at the 30:45 minute mark and watch through the end to hear examples of actual psychological studies and how they were analyzed. Listen for references to independent and dependent variables, experimenter bias, and double-blind studies. In the lecture, you’ll learn about breaking social norms, “WEIRD” research, why expectations matter, how a warm cup of coffee might make you nicer, why you should change your answer on a multiple choice test, and why praise for intelligence won’t make you any smarter.
You can view the transcript for “Lec 2 | MIT 9.00SC Introduction to Psychology, Spring 2011” here (opens in new window) .
Descriptive Research
There are many research methods available to psychologists in their efforts to understand, describe, and explain behavior and the cognitive and biological processes that underlie it. Some methods rely on observational techniques. Other approaches involve interactions between the researcher and the individuals who are being studied—ranging from a series of simple questions to extensive, in-depth interviews—to well-controlled experiments.
The three main categories of psychological research are descriptive, correlational, and experimental research. Research studies that do not test specific relationships between variables are called descriptive, or qualitative, studies . These studies are used to describe general or specific behaviors and attributes that are observed and measured. In the early stages of research it might be difficult to form a hypothesis, especially when there is not any existing literature in the area. In these situations designing an experiment would be premature, as the question of interest is not yet clearly defined as a hypothesis. Often a researcher will begin with a non-experimental approach, such as a descriptive study, to gather more information about the topic before designing an experiment or correlational study to address a specific hypothesis. Descriptive research is distinct from correlational research , in which psychologists formally test whether a relationship exists between two or more variables. Experimental research goes a step further beyond descriptive and correlational research and randomly assigns people to different conditions, using hypothesis testing to make inferences about how these conditions affect behavior. It aims to determine if one variable directly impacts and causes another. Correlational and experimental research both typically use hypothesis testing, whereas descriptive research does not.
Each of these research methods has unique strengths and weaknesses, and each method may only be appropriate for certain types of research questions. For example, studies that rely primarily on observation produce incredible amounts of information, but the ability to apply this information to the larger population is somewhat limited because of small sample sizes. Survey research, on the other hand, allows researchers to easily collect data from relatively large samples. While this allows for results to be generalized to the larger population more easily, the information that can be collected on any given survey is somewhat limited and subject to problems associated with any type of self-reported data. Some researchers conduct archival research by using existing records. While this can be a fairly inexpensive way to collect data that can provide insight into a number of research questions, researchers using this approach have no control on how or what kind of data was collected.
Correlational research can find a relationship between two variables, but the only way a researcher can claim that the relationship between the variables is cause and effect is to perform an experiment. In experimental research, which will be discussed later in the text, there is a tremendous amount of control over variables of interest. While this is a powerful approach, experiments are often conducted in very artificial settings. This calls into question the validity of experimental findings with regard to how they would apply in real-world settings. In addition, many of the questions that psychologists would like to answer cannot be pursued through experimental research because of ethical concerns.
The three main types of descriptive studies are, naturalistic observation, case studies, and surveys.
Naturalistic Observation
If you want to understand how behavior occurs, one of the best ways to gain information is to simply observe the behavior in its natural context. However, people might change their behavior in unexpected ways if they know they are being observed. How do researchers obtain accurate information when people tend to hide their natural behavior? As an example, imagine that your professor asks everyone in your class to raise their hand if they always wash their hands after using the restroom. Chances are that almost everyone in the classroom will raise their hand, but do you think hand washing after every trip to the restroom is really that universal?
This is very similar to the phenomenon mentioned earlier in this module: many individuals do not feel comfortable answering a question honestly. But if we are committed to finding out the facts about hand washing, we have other options available to us.
Suppose we send a classmate into the restroom to actually watch whether everyone washes their hands after using the restroom. Will our observer blend into the restroom environment by wearing a white lab coat, sitting with a clipboard, and staring at the sinks? We want our researcher to be inconspicuous—perhaps standing at one of the sinks pretending to put in contact lenses while secretly recording the relevant information. This type of observational study is called naturalistic observation : observing behavior in its natural setting. To better understand peer exclusion, Suzanne Fanger collaborated with colleagues at the University of Texas to observe the behavior of preschool children on a playground. How did the observers remain inconspicuous over the duration of the study? They equipped a few of the children with wireless microphones (which the children quickly forgot about) and observed while taking notes from a distance. Also, the children in that particular preschool (a “laboratory preschool”) were accustomed to having observers on the playground (Fanger, Frankel, & Hazen, 2012).

It is critical that the observer be as unobtrusive and as inconspicuous as possible: when people know they are being watched, they are less likely to behave naturally. If you have any doubt about this, ask yourself how your driving behavior might differ in two situations: In the first situation, you are driving down a deserted highway during the middle of the day; in the second situation, you are being followed by a police car down the same deserted highway (Figure 9).
It should be pointed out that naturalistic observation is not limited to research involving humans. Indeed, some of the best-known examples of naturalistic observation involve researchers going into the field to observe various kinds of animals in their own environments. As with human studies, the researchers maintain their distance and avoid interfering with the animal subjects so as not to influence their natural behaviors. Scientists have used this technique to study social hierarchies and interactions among animals ranging from ground squirrels to gorillas. The information provided by these studies is invaluable in understanding how those animals organize socially and communicate with one another. The anthropologist Jane Goodall, for example, spent nearly five decades observing the behavior of chimpanzees in Africa (Figure 10). As an illustration of the types of concerns that a researcher might encounter in naturalistic observation, some scientists criticized Goodall for giving the chimps names instead of referring to them by numbers—using names was thought to undermine the emotional detachment required for the objectivity of the study (McKie, 2010).

The greatest benefit of naturalistic observation is the validity, or accuracy, of information collected unobtrusively in a natural setting. Having individuals behave as they normally would in a given situation means that we have a higher degree of ecological validity, or realism, than we might achieve with other research approaches. Therefore, our ability to generalize the findings of the research to real-world situations is enhanced. If done correctly, we need not worry about people or animals modifying their behavior simply because they are being observed. Sometimes, people may assume that reality programs give us a glimpse into authentic human behavior. However, the principle of inconspicuous observation is violated as reality stars are followed by camera crews and are interviewed on camera for personal confessionals. Given that environment, we must doubt how natural and realistic their behaviors are.
The major downside of naturalistic observation is that they are often difficult to set up and control. In our restroom study, what if you stood in the restroom all day prepared to record people’s hand washing behavior and no one came in? Or, what if you have been closely observing a troop of gorillas for weeks only to find that they migrated to a new place while you were sleeping in your tent? The benefit of realistic data comes at a cost. As a researcher you have no control of when (or if) you have behavior to observe. In addition, this type of observational research often requires significant investments of time, money, and a good dose of luck.
Sometimes studies involve structured observation. In these cases, people are observed while engaging in set, specific tasks. An excellent example of structured observation comes from Strange Situation by Mary Ainsworth (you will read more about this in the module on lifespan development). The Strange Situation is a procedure used to evaluate attachment styles that exist between an infant and caregiver. In this scenario, caregivers bring their infants into a room filled with toys. The Strange Situation involves a number of phases, including a stranger coming into the room, the caregiver leaving the room, and the caregiver’s return to the room. The infant’s behavior is closely monitored at each phase, but it is the behavior of the infant upon being reunited with the caregiver that is most telling in terms of characterizing the infant’s attachment style with the caregiver.
Another potential problem in observational research is observer bias . Generally, people who act as observers are closely involved in the research project and may unconsciously skew their observations to fit their research goals or expectations. To protect against this type of bias, researchers should have clear criteria established for the types of behaviors recorded and how those behaviors should be classified. In addition, researchers often compare observations of the same event by multiple observers, in order to test inter-rater reliability : a measure of reliability that assesses the consistency of observations by different observers.
Case Studies
In 2011, the New York Times published a feature story on Krista and Tatiana Hogan, Canadian twin girls. These particular twins are unique because Krista and Tatiana are conjoined twins, connected at the head. There is evidence that the two girls are connected in a part of the brain called the thalamus, which is a major sensory relay center. Most incoming sensory information is sent through the thalamus before reaching higher regions of the cerebral cortex for processing.
The implications of this potential connection mean that it might be possible for one twin to experience the sensations of the other twin. For instance, if Krista is watching a particularly funny television program, Tatiana might smile or laugh even if she is not watching the program. This particular possibility has piqued the interest of many neuroscientists who seek to understand how the brain uses sensory information.
These twins represent an enormous resource in the study of the brain, and since their condition is very rare, it is likely that as long as their family agrees, scientists will follow these girls very closely throughout their lives to gain as much information as possible (Dominus, 2011).
In observational research, scientists are conducting a clinical or case study when they focus on one person or just a few individuals. Indeed, some scientists spend their entire careers studying just 10–20 individuals. Why would they do this? Obviously, when they focus their attention on a very small number of people, they can gain a tremendous amount of insight into those cases. The richness of information that is collected in clinical or case studies is unmatched by any other single research method. This allows the researcher to have a very deep understanding of the individuals and the particular phenomenon being studied.
If clinical or case studies provide so much information, why are they not more frequent among researchers? As it turns out, the major benefit of this particular approach is also a weakness. As mentioned earlier, this approach is often used when studying individuals who are interesting to researchers because they have a rare characteristic. Therefore, the individuals who serve as the focus of case studies are not like most other people. If scientists ultimately want to explain all behavior, focusing attention on such a special group of people can make it difficult to generalize any observations to the larger population as a whole. Generalizing refers to the ability to apply the findings of a particular research project to larger segments of society. Again, case studies provide enormous amounts of information, but since the cases are so specific, the potential to apply what’s learned to the average person may be very limited.
Often, psychologists develop surveys as a means of gathering data. Surveys are lists of questions to be answered by research participants, and can be delivered as paper-and-pencil questionnaires, administered electronically, or conducted verbally (Figure 11). Generally, the survey itself can be completed in a short time, and the ease of administering a survey makes it easy to collect data from a large number of people.
Surveys allow researchers to gather data from larger samples than may be afforded by other research methods . A sample is a subset of individuals selected from a population , which is the overall group of individuals that the researchers are interested in. Researchers study the sample and seek to generalize their findings to the population.

There is both strength and weakness of the survey in comparison to case studies. By using surveys, we can collect information from a larger sample of people. A larger sample is better able to reflect the actual diversity of the population, thus allowing better generalizability. Therefore, if our sample is sufficiently large and diverse, we can assume that the data we collect from the survey can be generalized to the larger population with more certainty than the information collected through a case study. However, given the greater number of people involved, we are not able to collect the same depth of information on each person that would be collected in a case study.
Another potential weakness of surveys is something we touched on earlier in this chapter: people don’t always give accurate responses. They may lie, misremember, or answer questions in a way that they think makes them look good. For example, people may report drinking less alcohol than is actually the case.
Any number of research questions can be answered through the use of surveys. One real-world example is the research conducted by Jenkins, Ruppel, Kizer, Yehl, and Griffin (2012) about the backlash against the US Arab-American community following the terrorist attacks of September 11, 2001. Jenkins and colleagues wanted to determine to what extent these negative attitudes toward Arab-Americans still existed nearly a decade after the attacks occurred. In one study, 140 research participants filled out a survey with 10 questions, including questions asking directly about the participant’s overt prejudicial attitudes toward people of various ethnicities. The survey also asked indirect questions about how likely the participant would be to interact with a person of a given ethnicity in a variety of settings (such as, “How likely do you think it is that you would introduce yourself to a person of Arab-American descent?”). The results of the research suggested that participants were unwilling to report prejudicial attitudes toward any ethnic group. However, there were significant differences between their pattern of responses to questions about social interaction with Arab-Americans compared to other ethnic groups: they indicated less willingness for social interaction with Arab-Americans compared to the other ethnic groups. This suggested that the participants harbored subtle forms of prejudice against Arab-Americans, despite their assertions that this was not the case (Jenkins et al., 2012).
Think It Over
Archival research.

In comparing archival research to other research methods, there are several important distinctions. For one, the researcher employing archival research never directly interacts with research participants. Therefore, the investment of time and money to collect data is considerably less with archival research. Additionally, researchers have no control over what information was originally collected. Therefore, research questions have to be tailored so they can be answered within the structure of the existing data sets. There is also no guarantee of consistency between the records from one source to another, which might make comparing and contrasting different data sets problematic.
Longitudinal and Cross-Sectional Research
Sometimes we want to see how people change over time, as in studies of human development and lifespan. When we test the same group of individuals repeatedly over an extended period of time, we are conducting longitudinal research. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time. For example, we may survey a group of individuals about their dietary habits at age 20, retest them a decade later at age 30, and then again at age 40.
Another approach is cross-sectional research . In cross-sectional research, a researcher compares multiple segments of the population at the same time. Using the dietary habits example above, the researcher might directly compare different groups of people by age. Instead of observing a group of people for 20 years to see how their dietary habits changed from decade to decade, the researcher would study a group of 20-year-old individuals and compare them to a group of 30-year-old individuals and a group of 40-year-old individuals. While cross-sectional research requires a shorter-term investment, it is also limited by differences that exist between the different generations (or cohorts) that have nothing to do with age per se, but rather reflect the social and cultural experiences of different generations of individuals make them different from one another.
To illustrate this concept, consider the following survey findings. In recent years there has been significant growth in the popular support of same-sex marriage. Many studies on this topic break down survey participants into different age groups. In general, younger people are more supportive of same-sex marriage than are those who are older (Jones, 2013). Does this mean that as we age we become less open to the idea of same-sex marriage, or does this mean that older individuals have different perspectives because of the social climates in which they grew up? Longitudinal research is a powerful approach because the same individuals are involved in the research project over time, which means that the researchers need to be less concerned with differences among cohorts affecting the results of their study.
Often longitudinal studies are employed when researching various diseases in an effort to understand particular risk factors. Such studies often involve tens of thousands of individuals who are followed for several decades. Given the enormous number of people involved in these studies, researchers can feel confident that their findings can be generalized to the larger population. The Cancer Prevention Study-3 (CPS-3) is one of a series of longitudinal studies sponsored by the American Cancer Society aimed at determining predictive risk factors associated with cancer. When participants enter the study, they complete a survey about their lives and family histories, providing information on factors that might cause or prevent the development of cancer. Then every few years the participants receive additional surveys to complete. In the end, hundreds of thousands of participants will be tracked over 20 years to determine which of them develop cancer and which do not.
Clearly, this type of research is important and potentially very informative. For instance, earlier longitudinal studies sponsored by the American Cancer Society provided some of the first scientific demonstrations of the now well-established links between increased rates of cancer and smoking (American Cancer Society, n.d.) (Figure 13).

As with any research strategy, longitudinal research is not without limitations. For one, these studies require an incredible time investment by the researcher and research participants. Given that some longitudinal studies take years, if not decades, to complete, the results will not be known for a considerable period of time. In addition to the time demands, these studies also require a substantial financial investment. Many researchers are unable to commit the resources necessary to see a longitudinal project through to the end.
Research participants must also be willing to continue their participation for an extended period of time, and this can be problematic. People move, get married and take new names, get ill, and eventually die. Even without significant life changes, some people may simply choose to discontinue their participation in the project. As a result, the attrition rates, or reduction in the number of research participants due to dropouts, in longitudinal studies are quite high and increases over the course of a project. For this reason, researchers using this approach typically recruit many participants fully expecting that a substantial number will drop out before the end. As the study progresses, they continually check whether the sample still represents the larger population, and make adjustments as necessary.
Correlational Research
Did you know that as sales in ice cream increase, so does the overall rate of crime? Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing crime do you think you might decide to treat yourself to a cone? There is no question that a relationship exists between ice cream and crime (e.g., Harper, 2013), but it would be pretty foolish to decide that one thing actually caused the other to occur.
It is much more likely that both ice cream sales and crime rates are related to the temperature outside. When the temperature is warm, there are lots of people out of their houses, interacting with each other, getting annoyed with one another, and sometimes committing crimes. Also, when it is warm outside, we are more likely to seek a cool treat like ice cream. How do we determine if there is indeed a relationship between two things? And when there is a relationship, how can we discern whether it is attributable to coincidence or causation?

Correlation Does Not Indicate Causation
Correlational research is useful because it allows us to discover the strength and direction of relationships that exist between two variables. However, correlation is limited because establishing the existence of a relationship tells us little about cause and effect . While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest. In the ice cream/crime rate example mentioned earlier, temperature is a confounding variable that could account for the relationship between the two variables.
Even when we cannot point to clear confounding variables, we should not assume that a correlation between two variables implies that one variable causes changes in another. This can be frustrating when a cause-and-effect relationship seems clear and intuitive. Think back to our discussion of the research done by the American Cancer Society and how their research projects were some of the first demonstrations of the link between smoking and cancer. It seems reasonable to assume that smoking causes cancer, but if we were limited to correlational research , we would be overstepping our bounds by making this assumption.

Unfortunately, people mistakenly make claims of causation as a function of correlations all the time. Such claims are especially common in advertisements and news stories. For example, recent research found that people who eat cereal on a regular basis achieve healthier weights than those who rarely eat cereal (Frantzen, Treviño, Echon, Garcia-Dominic, & DiMarco, 2013; Barton et al., 2005). Guess how the cereal companies report this finding. Does eating cereal really cause an individual to maintain a healthy weight, or are there other possible explanations, such as, someone at a healthy weight is more likely to regularly eat a healthy breakfast than someone who is obese or someone who avoids meals in an attempt to diet (Figure 15)? While correlational research is invaluable in identifying relationships among variables, a major limitation is the inability to establish causality. Psychologists want to make statements about cause and effect, but the only way to do that is to conduct an experiment to answer a research question. The next section describes how scientific experiments incorporate methods that eliminate, or control for, alternative explanations, which allow researchers to explore how changes in one variable cause changes in another variable.
Watch this clip from Freakonomics for an example of how correlation does not indicate causation.
You can view the transcript for “Correlation vs. Causality: Freakonomics Movie” here (opens in new window) .
Illusory Correlations
The temptation to make erroneous cause-and-effect statements based on correlational research is not the only way we tend to misinterpret data. We also tend to make the mistake of illusory correlations, especially with unsystematic observations. Illusory correlations , or false correlations, occur when people believe that relationships exist between two things when no such relationship exists. One well-known illusory correlation is the supposed effect that the moon’s phases have on human behavior. Many people passionately assert that human behavior is affected by the phase of the moon, and specifically, that people act strangely when the moon is full (Figure 16).

There is no denying that the moon exerts a powerful influence on our planet. The ebb and flow of the ocean’s tides are tightly tied to the gravitational forces of the moon. Many people believe, therefore, that it is logical that we are affected by the moon as well. After all, our bodies are largely made up of water. A meta-analysis of nearly 40 studies consistently demonstrated, however, that the relationship between the moon and our behavior does not exist (Rotton & Kelly, 1985). While we may pay more attention to odd behavior during the full phase of the moon, the rates of odd behavior remain constant throughout the lunar cycle.
Why are we so apt to believe in illusory correlations like this? Often we read or hear about them and simply accept the information as valid. Or, we have a hunch about how something works and then look for evidence to support that hunch, ignoring evidence that would tell us our hunch is false; this is known as confirmation bias . Other times, we find illusory correlations based on the information that comes most easily to mind, even if that information is severely limited. And while we may feel confident that we can use these relationships to better understand and predict the world around us, illusory correlations can have significant drawbacks. For example, research suggests that illusory correlations—in which certain behaviors are inaccurately attributed to certain groups—are involved in the formation of prejudicial attitudes that can ultimately lead to discriminatory behavior (Fiedler, 2004).
We all have a tendency to make illusory correlations from time to time. Try to think of an illusory correlation that is held by you, a family member, or a close friend. How do you think this illusory correlation came about and what can be done in the future to combat them?
Experiments
Causality: conducting experiments and using the data, experimental hypothesis.
In order to conduct an experiment, a researcher must have a specific hypothesis to be tested. As you’ve learned, hypotheses can be formulated either through direct observation of the real world or after careful review of previous research. For example, if you think that children should not be allowed to watch violent programming on television because doing so would cause them to behave more violently, then you have basically formulated a hypothesis—namely, that watching violent television programs causes children to behave more violently. How might you have arrived at this particular hypothesis? You may have younger relatives who watch cartoons featuring characters using martial arts to save the world from evildoers, with an impressive array of punching, kicking, and defensive postures. You notice that after watching these programs for a while, your young relatives mimic the fighting behavior of the characters portrayed in the cartoon (Figure 17).

These sorts of personal observations are what often lead us to formulate a specific hypothesis, but we cannot use limited personal observations and anecdotal evidence to rigorously test our hypothesis. Instead, to find out if real-world data supports our hypothesis, we have to conduct an experiment.
Designing an Experiment
The most basic experimental design involves two groups: the experimental group and the control group. The two groups are designed to be the same except for one difference— experimental manipulation. The experimental group gets the experimental manipulation—that is, the treatment or variable being tested (in this case, violent TV images)—and the control group does not. Since experimental manipulation is the only difference between the experimental and control groups, we can be sure that any differences between the two are due to experimental manipulation rather than chance.
In our example of how violent television programming might affect violent behavior in children, we have the experimental group view violent television programming for a specified time and then measure their violent behavior. We measure the violent behavior in our control group after they watch nonviolent television programming for the same amount of time. It is important for the control group to be treated similarly to the experimental group, with the exception that the control group does not receive the experimental manipulation. Therefore, we have the control group watch non-violent television programming for the same amount of time as the experimental group.
We also need to precisely define, or operationalize, what is considered violent and nonviolent. An operational definition is a description of how we will measure our variables, and it is important in allowing others understand exactly how and what a researcher measures in a particular experiment. In operationalizing violent behavior, we might choose to count only physical acts like kicking or punching as instances of this behavior, or we also may choose to include angry verbal exchanges. Whatever we determine, it is important that we operationalize violent behavior in such a way that anyone who hears about our study for the first time knows exactly what we mean by violence. This aids peoples’ ability to interpret our data as well as their capacity to repeat our experiment should they choose to do so.
Once we have operationalized what is considered violent television programming and what is considered violent behavior from our experiment participants, we need to establish how we will run our experiment. In this case, we might have participants watch a 30-minute television program (either violent or nonviolent, depending on their group membership) before sending them out to a playground for an hour where their behavior is observed and the number and type of violent acts is recorded.
Ideally, the people who observe and record the children’s behavior are unaware of who was assigned to the experimental or control group, in order to control for experimenter bias. Experimenter bias refers to the possibility that a researcher’s expectations might skew the results of the study. Remember, conducting an experiment requires a lot of planning, and the people involved in the research project have a vested interest in supporting their hypotheses. If the observers knew which child was in which group, it might influence how much attention they paid to each child’s behavior as well as how they interpreted that behavior. By being blind to which child is in which group, we protect against those biases. This situation is a single-blind study , meaning that one of the groups (participants) are unaware as to which group they are in (experiment or control group) while the researcher who developed the experiment knows which participants are in each group.

In a double-blind study , both the researchers and the participants are blind to group assignments. Why would a researcher want to run a study where no one knows who is in which group? Because by doing so, we can control for both experimenter and participant expectations. If you are familiar with the phrase placebo effect, you already have some idea as to why this is an important consideration. The placebo effect occurs when people’s expectations or beliefs influence or determine their experience in a given situation. In other words, simply expecting something to happen can actually make it happen.
The placebo effect is commonly described in terms of testing the effectiveness of a new medication. Imagine that you work in a pharmaceutical company, and you think you have a new drug that is effective in treating depression. To demonstrate that your medication is effective, you run an experiment with two groups: The experimental group receives the medication, and the control group does not. But you don’t want participants to know whether they received the drug or not.
Why is that? Imagine that you are a participant in this study, and you have just taken a pill that you think will improve your mood. Because you expect the pill to have an effect, you might feel better simply because you took the pill and not because of any drug actually contained in the pill—this is the placebo effect.
To make sure that any effects on mood are due to the drug and not due to expectations, the control group receives a placebo (in this case a sugar pill). Now everyone gets a pill, and once again neither the researcher nor the experimental participants know who got the drug and who got the sugar pill. Any differences in mood between the experimental and control groups can now be attributed to the drug itself rather than to experimenter bias or participant expectations (Figure 18).
Independent and Dependent Variables
In a research experiment, we strive to study whether changes in one thing cause changes in another. To achieve this, we must pay attention to two important variables, or things that can be changed, in any experimental study: the independent variable and the dependent variable. An independent variable is manipulated or controlled by the experimenter. In a well-designed experimental study, the independent variable is the only important difference between the experimental and control groups. In our example of how violent television programs affect children’s display of violent behavior, the independent variable is the type of program—violent or nonviolent—viewed by participants in the study (Figure 19). A dependent variable is what the researcher measures to see how much effect the independent variable had. In our example, the dependent variable is the number of violent acts displayed by the experimental participants.

We expect that the dependent variable will change as a function of the independent variable. In other words, the dependent variable depends on the independent variable. A good way to think about the relationship between the independent and dependent variables is with this question: What effect does the independent variable have on the dependent variable? Returning to our example, what effect does watching a half hour of violent television programming or nonviolent television programming have on the number of incidents of physical aggression displayed on the playground?
Selecting and Assigning Experimental Participants
Now that our study is designed, we need to obtain a sample of individuals to include in our experiment. Our study involves human participants so we need to determine who to include. Participants are the subjects of psychological research, and as the name implies, individuals who are involved in psychological research actively participate in the process. Often, psychological research projects rely on college students to serve as participants. In fact, the vast majority of research in psychology subfields has historically involved students as research participants (Sears, 1986; Arnett, 2008). But are college students truly representative of the general population? College students tend to be younger, more educated, more liberal, and less diverse than the general population. Although using students as test subjects is an accepted practice, relying on such a limited pool of research participants can be problematic because it is difficult to generalize findings to the larger population.
Our hypothetical experiment involves children, and we must first generate a sample of child participants. Samples are used because populations are usually too large to reasonably involve every member in our particular experiment (Figure 20). If possible, we should use a random sample (there are other types of samples, but for the purposes of this section, we will focus on random samples). A random sample is a subset of a larger population in which every member of the population has an equal chance of being selected. Random samples are preferred because if the sample is large enough we can be reasonably sure that the participating individuals are representative of the larger population. This means that the percentages of characteristics in the sample—sex, ethnicity, socioeconomic level, and any other characteristics that might affect the results—are close to those percentages in the larger population.
In our example, let’s say we decide our population of interest is fourth graders. But all fourth graders is a very large population, so we need to be more specific; instead we might say our population of interest is all fourth graders in a particular city. We should include students from various income brackets, family situations, races, ethnicities, religions, and geographic areas of town. With this more manageable population, we can work with the local schools in selecting a random sample of around 200 fourth graders who we want to participate in our experiment.
In summary, because we cannot test all of the fourth graders in a city, we want to find a group of about 200 that reflects the composition of that city. With a representative group, we can generalize our findings to the larger population without fear of our sample being biased in some way.

Now that we have a sample, the next step of the experimental process is to split the participants into experimental and control groups through random assignment. With random assignment , all participants have an equal chance of being assigned to either group. There is statistical software that will randomly assign each of the fourth graders in the sample to either the experimental or the control group.
Random assignment is critical for sound experimental design. With sufficiently large samples, random assignment makes it unlikely that there are systematic differences between the groups. So, for instance, it would be very unlikely that we would get one group composed entirely of males, a given ethnic identity, or a given religious ideology. This is important because if the groups were systematically different before the experiment began, we would not know the origin of any differences we find between the groups: Were the differences preexisting, or were they caused by manipulation of the independent variable? Random assignment allows us to assume that any differences observed between experimental and control groups result from the manipulation of the independent variable.
Issues to Consider
While experiments allow scientists to make cause-and-effect claims, they are not without problems. True experiments require the experimenter to manipulate an independent variable, and that can complicate many questions that psychologists might want to address. For instance, imagine that you want to know what effect sex (the independent variable) has on spatial memory (the dependent variable). Although you can certainly look for differences between males and females on a task that taps into spatial memory, you cannot directly control a person’s sex. We categorize this type of research approach as quasi-experimental and recognize that we cannot make cause-and-effect claims in these circumstances.
Experimenters are also limited by ethical constraints. For instance, you would not be able to conduct an experiment designed to determine if experiencing abuse as a child leads to lower levels of self-esteem among adults. To conduct such an experiment, you would need to randomly assign some experimental participants to a group that receives abuse, and that experiment would be unethical.
Introduction to Statistical Thinking
Psychologists use statistics to assist them in analyzing data, and also to give more precise measurements to describe whether something is statistically significant. Analyzing data using statistics enables researchers to find patterns, make claims, and share their results with others. In this section, you’ll learn about some of the tools that psychologists use in statistical analysis.
- Define reliability and validity
- Describe the importance of distributional thinking and the role of p-values in statistical inference
- Describe the role of random sampling and random assignment in drawing cause-and-effect conclusions
- Describe the basic structure of a psychological research article
Interpreting Experimental Findings
Once data is collected from both the experimental and the control groups, a statistical analysis is conducted to find out if there are meaningful differences between the two groups. A statistical analysis determines how likely any difference found is due to chance (and thus not meaningful). In psychology, group differences are considered meaningful, or significant, if the odds that these differences occurred by chance alone are 5 percent or less. Stated another way, if we repeated this experiment 100 times, we would expect to find the same results at least 95 times out of 100.
The greatest strength of experiments is the ability to assert that any significant differences in the findings are caused by the independent variable. This occurs because random selection, random assignment, and a design that limits the effects of both experimenter bias and participant expectancy should create groups that are similar in composition and treatment. Therefore, any difference between the groups is attributable to the independent variable, and now we can finally make a causal statement. If we find that watching a violent television program results in more violent behavior than watching a nonviolent program, we can safely say that watching violent television programs causes an increase in the display of violent behavior.
Reporting Research
When psychologists complete a research project, they generally want to share their findings with other scientists. The American Psychological Association (APA) publishes a manual detailing how to write a paper for submission to scientific journals. Unlike an article that might be published in a magazine like Psychology Today, which targets a general audience with an interest in psychology, scientific journals generally publish peer-reviewed journal articles aimed at an audience of professionals and scholars who are actively involved in research themselves.
A peer-reviewed journal article is read by several other scientists (generally anonymously) with expertise in the subject matter. These peer reviewers provide feedback—to both the author and the journal editor—regarding the quality of the draft. Peer reviewers look for a strong rationale for the research being described, a clear description of how the research was conducted, and evidence that the research was conducted in an ethical manner. They also look for flaws in the study’s design, methods, and statistical analyses. They check that the conclusions drawn by the authors seem reasonable given the observations made during the research. Peer reviewers also comment on how valuable the research is in advancing the discipline’s knowledge. This helps prevent unnecessary duplication of research findings in the scientific literature and, to some extent, ensures that each research article provides new information. Ultimately, the journal editor will compile all of the peer reviewer feedback and determine whether the article will be published in its current state (a rare occurrence), published with revisions, or not accepted for publication.
Peer review provides some degree of quality control for psychological research. Poorly conceived or executed studies can be weeded out, and even well-designed research can be improved by the revisions suggested. Peer review also ensures that the research is described clearly enough to allow other scientists to replicate it, meaning they can repeat the experiment using different samples to determine reliability. Sometimes replications involve additional measures that expand on the original finding. In any case, each replication serves to provide more evidence to support the original research findings. Successful replications of published research make scientists more apt to adopt those findings, while repeated failures tend to cast doubt on the legitimacy of the original article and lead scientists to look elsewhere. For example, it would be a major advancement in the medical field if a published study indicated that taking a new drug helped individuals achieve a healthy weight without changing their diet. But if other scientists could not replicate the results, the original study’s claims would be questioned.
Dig Deeper: The Vaccine-Autism Myth and the Retraction of Published Studies
Some scientists have claimed that routine childhood vaccines cause some children to develop autism, and, in fact, several peer-reviewed publications published research making these claims. Since the initial reports, large-scale epidemiological research has suggested that vaccinations are not responsible for causing autism and that it is much safer to have your child vaccinated than not. Furthermore, several of the original studies making this claim have since been retracted.
A published piece of work can be rescinded when data is called into question because of falsification, fabrication, or serious research design problems. Once rescinded, the scientific community is informed that there are serious problems with the original publication. Retractions can be initiated by the researcher who led the study, by research collaborators, by the institution that employed the researcher, or by the editorial board of the journal in which the article was originally published. In the vaccine-autism case, the retraction was made because of a significant conflict of interest in which the leading researcher had a financial interest in establishing a link between childhood vaccines and autism (Offit, 2008). Unfortunately, the initial studies received so much media attention that many parents around the world became hesitant to have their children vaccinated (Figure 21). For more information about how the vaccine/autism story unfolded, as well as the repercussions of this story, take a look at Paul Offit’s book, Autism’s False Prophets: Bad Science, Risky Medicine, and the Search for a Cure.

Reliability and Validity
Dig deeper: everyday connection: how valid is the sat.
Standardized tests like the SAT are supposed to measure an individual’s aptitude for a college education, but how reliable and valid are such tests? Research conducted by the College Board suggests that scores on the SAT have high predictive validity for first-year college students’ GPA (Kobrin, Patterson, Shaw, Mattern, & Barbuti, 2008). In this context, predictive validity refers to the test’s ability to effectively predict the GPA of college freshmen. Given that many institutions of higher education require the SAT for admission, this high degree of predictive validity might be comforting.
However, the emphasis placed on SAT scores in college admissions has generated some controversy on a number of fronts. For one, some researchers assert that the SAT is a biased test that places minority students at a disadvantage and unfairly reduces the likelihood of being admitted into a college (Santelices & Wilson, 2010). Additionally, some research has suggested that the predictive validity of the SAT is grossly exaggerated in how well it is able to predict the GPA of first-year college students. In fact, it has been suggested that the SAT’s predictive validity may be overestimated by as much as 150% (Rothstein, 2004). Many institutions of higher education are beginning to consider de-emphasizing the significance of SAT scores in making admission decisions (Rimer, 2008).
In 2014, College Board president David Coleman expressed his awareness of these problems, recognizing that college success is more accurately predicted by high school grades than by SAT scores. To address these concerns, he has called for significant changes to the SAT exam (Lewin, 2014).
Statistical Significance

Does drinking coffee actually increase your life expectancy? A recent study (Freedman, Park, Abnet, Hollenbeck, & Sinha, 2012) found that men who drank at least six cups of coffee a day also had a 10% lower chance of dying (women’s chances were 15% lower) than those who drank none. Does this mean you should pick up or increase your own coffee habit? We will explore these results in more depth in the next section about drawing conclusions from statistics. Modern society has become awash in studies such as this; you can read about several such studies in the news every day.
Conducting such a study well, and interpreting the results of such studies requires understanding basic ideas of statistics , the science of gaining insight from data. Key components to a statistical investigation are:
- Planning the study: Start by asking a testable research question and deciding how to collect data. For example, how long was the study period of the coffee study? How many people were recruited for the study, how were they recruited, and from where? How old were they? What other variables were recorded about the individuals? Were changes made to the participants’ coffee habits during the course of the study?
- Examining the data: What are appropriate ways to examine the data? What graphs are relevant, and what do they reveal? What descriptive statistics can be calculated to summarize relevant aspects of the data, and what do they reveal? What patterns do you see in the data? Are there any individual observations that deviate from the overall pattern, and what do they reveal? For example, in the coffee study, did the proportions differ when we compared the smokers to the non-smokers?
- Inferring from the data: What are valid statistical methods for drawing inferences “beyond” the data you collected? In the coffee study, is the 10%–15% reduction in risk of death something that could have happened just by chance?
- Drawing conclusions: Based on what you learned from your data, what conclusions can you draw? Who do you think these conclusions apply to? (Were the people in the coffee study older? Healthy? Living in cities?) Can you draw a cause-and-effect conclusion about your treatments? (Are scientists now saying that the coffee drinking is the cause of the decreased risk of death?)
Notice that the numerical analysis (“crunching numbers” on the computer) comprises only a small part of overall statistical investigation. In this section, you will see how we can answer some of these questions and what questions you should be asking about any statistical investigation you read about.
Distributional Thinking
When data are collected to address a particular question, an important first step is to think of meaningful ways to organize and examine the data. Let’s take a look at an example.
Example 1 : Researchers investigated whether cancer pamphlets are written at an appropriate level to be read and understood by cancer patients (Short, Moriarty, & Cooley, 1995). Tests of reading ability were given to 63 patients. In addition, readability level was determined for a sample of 30 pamphlets, based on characteristics such as the lengths of words and sentences in the pamphlet. The results, reported in terms of grade levels, are displayed in Figure 23.

- Data vary . More specifically, values of a variable (such as reading level of a cancer patient or readability level of a cancer pamphlet) vary.
- Analyzing the pattern of variation, called the distribution of the variable, often reveals insights.
Addressing the research question of whether the cancer pamphlets are written at appropriate levels for the cancer patients requires comparing the two distributions. A naïve comparison might focus only on the centers of the distributions. Both medians turn out to be ninth grade, but considering only medians ignores the variability and the overall distributions of these data. A more illuminating approach is to compare the entire distributions, for example with a graph, as in Figure 24.

Figure 24 makes clear that the two distributions are not well aligned at all. The most glaring discrepancy is that many patients (17/63, or 27%, to be precise) have a reading level below that of the most readable pamphlet. These patients will need help to understand the information provided in the cancer pamphlets. Notice that this conclusion follows from considering the distributions as a whole, not simply measures of center or variability, and that the graph contrasts those distributions more immediately than the frequency tables.
Finding Significance in Data
Even when we find patterns in data, often there is still uncertainty in various aspects of the data. For example, there may be potential for measurement errors (even your own body temperature can fluctuate by almost 1°F over the course of the day). Or we may only have a “snapshot” of observations from a more long-term process or only a small subset of individuals from the population of interest. In such cases, how can we determine whether patterns we see in our small set of data is convincing evidence of a systematic phenomenon in the larger process or population? Let’s take a look at another example.
Example 2 : In a study reported in the November 2007 issue of Nature , researchers investigated whether pre-verbal infants take into account an individual’s actions toward others in evaluating that individual as appealing or aversive (Hamlin, Wynn, & Bloom, 2007). In one component of the study, 10-month-old infants were shown a “climber” character (a piece of wood with “googly” eyes glued onto it) that could not make it up a hill in two tries. Then the infants were shown two scenarios for the climber’s next try, one where the climber was pushed to the top of the hill by another character (“helper”), and one where the climber was pushed back down the hill by another character (“hinderer”). The infant was alternately shown these two scenarios several times. Then the infant was presented with two pieces of wood (representing the helper and the hinderer characters) and asked to pick one to play with.
The researchers found that of the 16 infants who made a clear choice, 14 chose to play with the helper toy. One possible explanation for this clear majority result is that the helping behavior of the one toy increases the infants’ likelihood of choosing that toy. But are there other possible explanations? What about the color of the toy? Well, prior to collecting the data, the researchers arranged so that each color and shape (red square and blue circle) would be seen by the same number of infants. Or maybe the infants had right-handed tendencies and so picked whichever toy was closer to their right hand?
Well, prior to collecting the data, the researchers arranged it so half the infants saw the helper toy on the right and half on the left. Or, maybe the shapes of these wooden characters (square, triangle, circle) had an effect? Perhaps, but again, the researchers controlled for this by rotating which shape was the helper toy, the hinderer toy, and the climber. When designing experiments, it is important to control for as many variables as might affect the responses as possible. It is beginning to appear that the researchers accounted for all the other plausible explanations. But there is one more important consideration that cannot be controlled—if we did the study again with these 16 infants, they might not make the same choices. In other words, there is some randomness inherent in their selection process.
Maybe each infant had no genuine preference at all, and it was simply “random luck” that led to 14 infants picking the helper toy. Although this random component cannot be controlled, we can apply a probability model to investigate the pattern of results that would occur in the long run if random chance were the only factor.
If the infants were equally likely to pick between the two toys, then each infant had a 50% chance of picking the helper toy. It’s like each infant tossed a coin, and if it landed heads, the infant picked the helper toy. So if we tossed a coin 16 times, could it land heads 14 times? Sure, it’s possible, but it turns out to be very unlikely. Getting 14 (or more) heads in 16 tosses is about as likely as tossing a coin and getting 9 heads in a row. This probability is referred to as a p-value . The p-value represents the likelihood that experimental results happened by chance. Within psychology, the most common standard for p-values is “p < .05”. What this means is that there is less than a 5% probability that the results happened just by random chance, and therefore a 95% probability that the results reflect a meaningful pattern in human psychology. We call this statistical significance .
So, in the study above, if we assume that each infant was choosing equally, then the probability that 14 or more out of 16 infants would choose the helper toy is found to be 0.0021. We have only two logical possibilities: either the infants have a genuine preference for the helper toy, or the infants have no preference (50/50) and an outcome that would occur only 2 times in 1,000 iterations happened in this study. Because this p-value of 0.0021 is quite small, we conclude that the study provides very strong evidence that these infants have a genuine preference for the helper toy.
If we compare the p-value to some cut-off value, like 0.05, we see that the p=value is smaller. Because the p-value is smaller than that cut-off value, then we reject the hypothesis that only random chance was at play here. In this case, these researchers would conclude that significantly more than half of the infants in the study chose the helper toy, giving strong evidence of a genuine preference for the toy with the helping behavior.
Drawing Conclusions from Statistics
Generalizability.

One limitation to the study mentioned previously about the babies choosing the “helper” toy is that the conclusion only applies to the 16 infants in the study. We don’t know much about how those 16 infants were selected. Suppose we want to select a subset of individuals (a sample ) from a much larger group of individuals (the population ) in such a way that conclusions from the sample can be generalized to the larger population. This is the question faced by pollsters every day.
Example 3 : The General Social Survey (GSS) is a survey on societal trends conducted every other year in the United States. Based on a sample of about 2,000 adult Americans, researchers make claims about what percentage of the U.S. population consider themselves to be “liberal,” what percentage consider themselves “happy,” what percentage feel “rushed” in their daily lives, and many other issues. The key to making these claims about the larger population of all American adults lies in how the sample is selected. The goal is to select a sample that is representative of the population, and a common way to achieve this goal is to select a r andom sample that gives every member of the population an equal chance of being selected for the sample. In its simplest form, random sampling involves numbering every member of the population and then using a computer to randomly select the subset to be surveyed. Most polls don’t operate exactly like this, but they do use probability-based sampling methods to select individuals from nationally representative panels.
In 2004, the GSS reported that 817 of 977 respondents (or 83.6%) indicated that they always or sometimes feel rushed. This is a clear majority, but we again need to consider variation due to random sampling . Fortunately, we can use the same probability model we did in the previous example to investigate the probable size of this error. (Note, we can use the coin-tossing model when the actual population size is much, much larger than the sample size, as then we can still consider the probability to be the same for every individual in the sample.) This probability model predicts that the sample result will be within 3 percentage points of the population value (roughly 1 over the square root of the sample size, the margin of error. A statistician would conclude, with 95% confidence, that between 80.6% and 86.6% of all adult Americans in 2004 would have responded that they sometimes or always feel rushed.
The key to the margin of error is that when we use a probability sampling method, we can make claims about how often (in the long run, with repeated random sampling) the sample result would fall within a certain distance from the unknown population value by chance (meaning by random sampling variation) alone. Conversely, non-random samples are often suspect to bias, meaning the sampling method systematically over-represents some segments of the population and under-represents others. We also still need to consider other sources of bias, such as individuals not responding honestly. These sources of error are not measured by the margin of error.
Cause and Effect
In many research studies, the primary question of interest concerns differences between groups. Then the question becomes how were the groups formed (e.g., selecting people who already drink coffee vs. those who don’t). In some studies, the researchers actively form the groups themselves. But then we have a similar question—could any differences we observe in the groups be an artifact of that group-formation process? Or maybe the difference we observe in the groups is so large that we can discount a “fluke” in the group-formation process as a reasonable explanation for what we find?
Example 4 : A psychology study investigated whether people tend to display more creativity when they are thinking about intrinsic (internal) or extrinsic (external) motivations (Ramsey & Schafer, 2002, based on a study by Amabile, 1985). The subjects were 47 people with extensive experience with creative writing. Subjects began by answering survey questions about either intrinsic motivations for writing (such as the pleasure of self-expression) or extrinsic motivations (such as public recognition). Then all subjects were instructed to write a haiku, and those poems were evaluated for creativity by a panel of judges. The researchers conjectured beforehand that subjects who were thinking about intrinsic motivations would display more creativity than subjects who were thinking about extrinsic motivations. The creativity scores from the 47 subjects in this study are displayed in Figure 26, where higher scores indicate more creativity.

In this example, the key question is whether the type of motivation affects creativity scores. In particular, do subjects who were asked about intrinsic motivations tend to have higher creativity scores than subjects who were asked about extrinsic motivations?
Figure 26 reveals that both motivation groups saw considerable variability in creativity scores, and these scores have considerable overlap between the groups. In other words, it’s certainly not always the case that those with extrinsic motivations have higher creativity than those with intrinsic motivations, but there may still be a statistical tendency in this direction. (Psychologist Keith Stanovich (2013) refers to people’s difficulties with thinking about such probabilistic tendencies as “the Achilles heel of human cognition.”)
The mean creativity score is 19.88 for the intrinsic group, compared to 15.74 for the extrinsic group, which supports the researchers’ conjecture. Yet comparing only the means of the two groups fails to consider the variability of creativity scores in the groups. We can measure variability with statistics using, for instance, the standard deviation: 5.25 for the extrinsic group and 4.40 for the intrinsic group. The standard deviations tell us that most of the creativity scores are within about 5 points of the mean score in each group. We see that the mean score for the intrinsic group lies within one standard deviation of the mean score for extrinsic group. So, although there is a tendency for the creativity scores to be higher in the intrinsic group, on average, the difference is not extremely large.
We again want to consider possible explanations for this difference. The study only involved individuals with extensive creative writing experience. Although this limits the population to which we can generalize, it does not explain why the mean creativity score was a bit larger for the intrinsic group than for the extrinsic group. Maybe women tend to receive higher creativity scores? Here is where we need to focus on how the individuals were assigned to the motivation groups. If only women were in the intrinsic motivation group and only men in the extrinsic group, then this would present a problem because we wouldn’t know if the intrinsic group did better because of the different type of motivation or because they were women. However, the researchers guarded against such a problem by randomly assigning the individuals to the motivation groups. Like flipping a coin, each individual was just as likely to be assigned to either type of motivation. Why is this helpful? Because this random assignment tends to balance out all the variables related to creativity we can think of, and even those we don’t think of in advance, between the two groups. So we should have a similar male/female split between the two groups; we should have a similar age distribution between the two groups; we should have a similar distribution of educational background between the two groups; and so on. Random assignment should produce groups that are as similar as possible except for the type of motivation, which presumably eliminates all those other variables as possible explanations for the observed tendency for higher scores in the intrinsic group.
But does this always work? No, so by “luck of the draw” the groups may be a little different prior to answering the motivation survey. So then the question is, is it possible that an unlucky random assignment is responsible for the observed difference in creativity scores between the groups? In other words, suppose each individual’s poem was going to get the same creativity score no matter which group they were assigned to, that the type of motivation in no way impacted their score. Then how often would the random-assignment process alone lead to a difference in mean creativity scores as large (or larger) than 19.88 – 15.74 = 4.14 points?
We again want to apply to a probability model to approximate a p-value , but this time the model will be a bit different. Think of writing everyone’s creativity scores on an index card, shuffling up the index cards, and then dealing out 23 to the extrinsic motivation group and 24 to the intrinsic motivation group, and finding the difference in the group means. We (better yet, the computer) can repeat this process over and over to see how often, when the scores don’t change, random assignment leads to a difference in means at least as large as 4.41. Figure 27 shows the results from 1,000 such hypothetical random assignments for these scores.

Only 2 of the 1,000 simulated random assignments produced a difference in group means of 4.41 or larger. In other words, the approximate p-value is 2/1000 = 0.002. This small p-value indicates that it would be very surprising for the random assignment process alone to produce such a large difference in group means. Therefore, as with Example 2, we have strong evidence that focusing on intrinsic motivations tends to increase creativity scores, as compared to thinking about extrinsic motivations.
Notice that the previous statement implies a cause-and-effect relationship between motivation and creativity score; is such a strong conclusion justified? Yes, because of the random assignment used in the study. That should have balanced out any other variables between the two groups, so now that the small p-value convinces us that the higher mean in the intrinsic group wasn’t just a coincidence, the only reasonable explanation left is the difference in the type of motivation. Can we generalize this conclusion to everyone? Not necessarily—we could cautiously generalize this conclusion to individuals with extensive experience in creative writing similar the individuals in this study, but we would still want to know more about how these individuals were selected to participate.

Statistical thinking involves the careful design of a study to collect meaningful data to answer a focused research question, detailed analysis of patterns in the data, and drawing conclusions that go beyond the observed data. Random sampling is paramount to generalizing results from our sample to a larger population, and random assignment is key to drawing cause-and-effect conclusions. With both kinds of randomness, probability models help us assess how much random variation we can expect in our results, in order to determine whether our results could happen by chance alone and to estimate a margin of error.
So where does this leave us with regard to the coffee study mentioned previously (the Freedman, Park, Abnet, Hollenbeck, & Sinha, 2012 found that men who drank at least six cups of coffee a day had a 10% lower chance of dying (women 15% lower) than those who drank none)? We can answer many of the questions:
- This was a 14-year study conducted by researchers at the National Cancer Institute.
- The results were published in the June issue of the New England Journal of Medicine , a respected, peer-reviewed journal.
- The study reviewed coffee habits of more than 402,000 people ages 50 to 71 from six states and two metropolitan areas. Those with cancer, heart disease, and stroke were excluded at the start of the study. Coffee consumption was assessed once at the start of the study.
- About 52,000 people died during the course of the study.
- People who drank between two and five cups of coffee daily showed a lower risk as well, but the amount of reduction increased for those drinking six or more cups.
- The sample sizes were fairly large and so the p-values are quite small, even though percent reduction in risk was not extremely large (dropping from a 12% chance to about 10%–11%).
- Whether coffee was caffeinated or decaffeinated did not appear to affect the results.
- This was an observational study, so no cause-and-effect conclusions can be drawn between coffee drinking and increased longevity, contrary to the impression conveyed by many news headlines about this study. In particular, it’s possible that those with chronic diseases don’t tend to drink coffee.
This study needs to be reviewed in the larger context of similar studies and consistency of results across studies, with the constant caution that this was not a randomized experiment. Whereas a statistical analysis can still “adjust” for other potential confounding variables, we are not yet convinced that researchers have identified them all or completely isolated why this decrease in death risk is evident. Researchers can now take the findings of this study and develop more focused studies that address new questions.
Explore these outside resources to learn more about applied statistics:
- Video about p-values: P-Value Extravaganza
- Interactive web applets for teaching and learning statistics
- Inter-university Consortium for Political and Social Research where you can find and analyze data.
- The Consortium for the Advancement of Undergraduate Statistics
- Find a recent research article in your field and answer the following: What was the primary research question? How were individuals selected to participate in the study? Were summary results provided? How strong is the evidence presented in favor or against the research question? Was random assignment used? Summarize the main conclusions from the study, addressing the issues of statistical significance, statistical confidence, generalizability, and cause and effect. Do you agree with the conclusions drawn from this study, based on the study design and the results presented?
- Is it reasonable to use a random sample of 1,000 individuals to draw conclusions about all U.S. adults? Explain why or why not.
How to Read Research
In this course and throughout your academic career, you’ll be reading journal articles (meaning they were published by experts in a peer-reviewed journal) and reports that explain psychological research. It’s important to understand the format of these articles so that you can read them strategically and understand the information presented. Scientific articles vary in content or structure, depending on the type of journal to which they will be submitted. Psychological articles and many papers in the social sciences follow the writing guidelines and format dictated by the American Psychological Association (APA). In general, the structure follows: abstract, introduction, methods, results, discussion, and references.
- Abstract : the abstract is the concise summary of the article. It summarizes the most important features of the manuscript, providing the reader with a global first impression on the article. It is generally just one paragraph that explains the experiment as well as a short synopsis of the results.
- Introduction : this section provides background information about the origin and purpose of performing the experiment or study. It reviews previous research and presents existing theories on the topic.
- Method : this section covers the methodologies used to investigate the research question, including the identification of participants , procedures , and materials as well as a description of the actual procedure . It should be sufficiently detailed to allow for replication.
- Results : the results section presents key findings of the research, including reference to indicators of statistical significance.
- Discussion : this section provides an interpretation of the findings, states their significance for current research, and derives implications for theory and practice. Alternative interpretations for findings are also provided, particularly when it is not possible to conclude for the directionality of the effects. In the discussion, authors also acknowledge the strengths and limitations/weaknesses of the study and offer concrete directions about for future research.
Watch this 3-minute video for an explanation on how to read scholarly articles. Look closely at the example article shared just before the two minute mark.
https://digitalcommons.coastal.edu/kimbel-library-instructional-videos/9/
Practice identifying these key components in the following experiment: Food-Induced Emotional Resonance Improves Emotion Recognition.
In this chapter, you learned to
- define and apply the scientific method to psychology
- describe the strengths and weaknesses of descriptive, experimental, and correlational research
- define the basic elements of a statistical investigation
Putting It Together: Psychological Research
Psychologists use the scientific method to examine human behavior and mental processes. Some of the methods you learned about include descriptive, experimental, and correlational research designs.
Watch the CrashCourse video to review the material you learned, then read through the following examples and see if you can come up with your own design for each type of study.
You can view the transcript for “Psychological Research: Crash Course Psychology #2” here (opens in new window).
Case Study: a detailed analysis of a particular person, group, business, event, etc. This approach is commonly used to to learn more about rare examples with the goal of describing that particular thing.
- Ted Bundy was one of America’s most notorious serial killers who murdered at least 30 women and was executed in 1989. Dr. Al Carlisle evaluated Bundy when he was first arrested and conducted a psychological analysis of Bundy’s development of his sexual fantasies merging into reality (Ramsland, 2012). Carlisle believes that there was a gradual evolution of three processes that guided his actions: fantasy, dissociation, and compartmentalization (Ramsland, 2012). Read Imagining Ted Bundy (http://goo.gl/rGqcUv) for more information on this case study.
Naturalistic Observation : a researcher unobtrusively collects information without the participant’s awareness.
- Drain and Engelhardt (2013) observed six nonverbal children with autism’s evoked and spontaneous communicative acts. Each of the children attended a school for children with autism and were in different classes. They were observed for 30 minutes of each school day. By observing these children without them knowing, they were able to see true communicative acts without any external influences.
Survey : participants are asked to provide information or responses to questions on a survey or structure assessment.
- Educational psychologists can ask students to report their grade point average and what, if anything, they eat for breakfast on an average day. A healthy breakfast has been associated with better academic performance (Digangi’s 1999).
- Anderson (1987) tried to find the relationship between uncomfortably hot temperatures and aggressive behavior, which was then looked at with two studies done on violent and nonviolent crime. Based on previous research that had been done by Anderson and Anderson (1984), it was predicted that violent crimes would be more prevalent during the hotter time of year and the years in which it was hotter weather in general. The study confirmed this prediction.
Longitudinal Study: researchers recruit a sample of participants and track them for an extended period of time.
- In a study of a representative sample of 856 children Eron and his colleagues (1972) found that a boy’s exposure to media violence at age eight was significantly related to his aggressive behavior ten years later, after he graduated from high school.
Cross-Sectional Study: researchers gather participants from different groups (commonly different ages) and look for differences between the groups.
- In 1996, Russell surveyed people of varying age groups and found that people in their 20s tend to report being more lonely than people in their 70s.
Correlational Design: two different variables are measured to determine whether there is a relationship between them.
- Thornhill et al. (2003) had people rate how physically attractive they found other people to be. They then had them separately smell t-shirts those people had worn (without knowing which clothes belonged to whom) and rate how good or bad their body oder was. They found that the more attractive someone was the more pleasant their body order was rated to be.
- Clinical psychologists can test a new pharmaceutical treatment for depression by giving some patients the new pill and others an already-tested one to see which is the more effective treatment.
American Cancer Society. (n.d.). History of the cancer prevention studies. Retrieved from http://www.cancer.org/research/researchtopreventcancer/history-cancer-prevention-study
American Psychological Association. (2009). Publication Manual of the American Psychological Association (6th ed.). Washington, DC: Author.
American Psychological Association. (n.d.). Research with animals in psychology. Retrieved from https://www.apa.org/research/responsible/research-animals.pdf
Arnett, J. (2008). The neglected 95%: Why American psychology needs to become less American. American Psychologist, 63(7), 602–614.
Barton, B. A., Eldridge, A. L., Thompson, D., Affenito, S. G., Striegel-Moore, R. H., Franko, D. L., . . . Crockett, S. J. (2005). The relationship of breakfast and cereal consumption to nutrient intake and body mass index: The national heart, lung, and blood institute growth and health study. Journal of the American Dietetic Association, 105(9), 1383–1389. Retrieved from http://dx.doi.org/10.1016/j.jada.2005.06.003
Chwalisz, K., Diener, E., & Gallagher, D. (1988). Autonomic arousal feedback and emotional experience: Evidence from the spinal cord injured. Journal of Personality and Social Psychology, 54, 820–828.
Dominus, S. (2011, May 25). Could conjoined twins share a mind? New York Times Sunday Magazine. Retrieved from http://www.nytimes.com/2011/05/29/magazine/could-conjoined-twins-share-a-mind.html?_r=5&hp&
Fanger, S. M., Frankel, L. A., & Hazen, N. (2012). Peer exclusion in preschool children’s play: Naturalistic observations in a playground setting. Merrill-Palmer Quarterly, 58, 224–254.
Fiedler, K. (2004). Illusory correlation. In R. F. Pohl (Ed.), Cognitive illusions: A handbook on fallacies and biases in thinking, judgment and memory (pp. 97–114). New York, NY: Psychology Press.
Frantzen, L. B., Treviño, R. P., Echon, R. M., Garcia-Dominic, O., & DiMarco, N. (2013). Association between frequency of ready-to-eat cereal consumption, nutrient intakes, and body mass index in fourth- to sixth-grade low-income minority children. Journal of the Academy of Nutrition and Dietetics, 113(4), 511–519.
Harper, J. (2013, July 5). Ice cream and crime: Where cold cuisine and hot disputes intersect. The Times-Picaune. Retrieved from http://www.nola.com/crime/index.ssf/2013/07/ice_cream_and_crime_where_hot.html
Jenkins, W. J., Ruppel, S. E., Kizer, J. B., Yehl, J. L., & Griffin, J. L. (2012). An examination of post 9-11 attitudes towards Arab Americans. North American Journal of Psychology, 14, 77–84.
Jones, J. M. (2013, May 13). Same-sex marriage support solidifies above 50% in U.S. Gallup Politics. Retrieved from http://www.gallup.com/poll/162398/sex-marriage-support-solidifies-above.aspx
Kobrin, J. L., Patterson, B. F., Shaw, E. J., Mattern, K. D., & Barbuti, S. M. (2008). Validity of the SAT for predicting first-year college grade point average (Research Report No. 2008-5). Retrieved from https://research.collegeboard.org/sites/default/files/publications/2012/7/researchreport-2008-5-validity-sat-predicting-first-year-college-grade-point-average.pdf
Lewin, T. (2014, March 5). A new SAT aims to realign with schoolwork. New York Times. Retreived from http://www.nytimes.com/2014/03/06/education/major-changes-in-sat-announced-by-college-board.html.
Lowry, M., Dean, K., & Manders, K. (2010). The link between sleep quantity and academic performance for the college student. Sentience: The University of Minnesota Undergraduate Journal of Psychology, 3(Spring), 16–19. Retrieved from http://www.psych.umn.edu/sentience/files/SENTIENCE_Vol3.pdf
McKie, R. (2010, June 26). Chimps with everything: Jane Goodall’s 50 years in the jungle. The Guardian. Retrieved from http://www.theguardian.com/science/2010/jun/27/jane-goodall-chimps-africa-interview
Offit, P. (2008). Autism’s false prophets: Bad science, risky medicine, and the search for a cure. New York: Columbia University Press.
Perkins, H. W., Haines, M. P., & Rice, R. (2005). Misperceiving the college drinking norm and related problems: A nationwide study of exposure to prevention information, perceived norms and student alcohol misuse. J. Stud. Alcohol, 66(4), 470–478.
Rimer, S. (2008, September 21). College panel calls for less focus on SATs. The New York Times. Retrieved from http://www.nytimes.com/2008/09/22/education/22admissions.html?_r=0
Rothstein, J. M. (2004). College performance predictions and the SAT. Journal of Econometrics, 121, 297–317.
Rotton, J., & Kelly, I. W. (1985). Much ado about the full moon: A meta-analysis of lunar-lunacy research. Psychological Bulletin, 97(2), 286–306. doi:10.1037/0033-2909.97.2.286
Santelices, M. V., & Wilson, M. (2010). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Education Review, 80, 106–134.
Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51, 515–530.
Tuskegee University. (n.d.). About the USPHS Syphilis Study. Retrieved from http://www.tuskegee.edu/about_us/centers_of_excellence/bioethics_center/about_the_usphs_syphilis_study.aspx.
CC licensed content, Original
- Psychological Research Methods. Provided by : Karenna Malavanti. License : CC BY-SA: Attribution ShareAlike
CC licensed content, Shared previously
- Psychological Research. Provided by : OpenStax College. License : CC BY: Attribution . License Terms : Download for free at https://openstax.org/books/psychology-2e/pages/1-introduction. Located at : https://openstax.org/books/psychology-2e/pages/2-introduction .
- Why It Matters: Psychological Research. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/introduction-15/
- Introduction to The Scientific Method. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/outcome-the-scientific-method/
- Research picture. Authored by : Mediterranean Center of Medical Sciences. Provided by : Flickr. License : CC BY: Attribution Located at : https://www.flickr.com/photos/mcmscience/17664002728 .
- The Scientific Process. Provided by : Lumen Learning. License : CC BY-SA: Attribution ShareAlike Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/reading-the-scientific-process/
- Ethics in Research. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/ethics/
- Ethics. Authored by : OpenStax College. Located at : https://openstax.org/books/psychology-2e/pages/2-4-ethics . License : CC BY: Attribution . License Terms : Download for free at https://openstax.org/books/psychology-2e/pages/1-introduction .
- Introduction to Approaches to Research. Provided by : Lumen Learning. License : CC BY-NC-SA: Attribution NonCommercial ShareAlike Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/outcome-approaches-to-research/
- Lec 2 | MIT 9.00SC Introduction to Psychology, Spring 2011. Authored by : John Gabrieli. Provided by : MIT OpenCourseWare. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike Located at : https://www.youtube.com/watch?v=syXplPKQb_o .
- Paragraph on correlation. Authored by : Christie Napa Scollon. Provided by : Singapore Management University. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike Located at : http://nobaproject.com/modules/research-designs?r=MTc0ODYsMjMzNjQ%3D . Project : The Noba Project.
- Descriptive Research. Provided by : Lumen Learning. License : CC BY-SA: Attribution ShareAlike Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/reading-clinical-or-case-studies/
- Approaches to Research. Authored by : OpenStax College. License : CC BY: Attribution . License Terms : Download for free at https://openstax.org/books/psychology-2e/pages/1-introduction. Located at : https://openstax.org/books/psychology-2e/pages/2-2-approaches-to-research
- Analyzing Findings. Authored by : OpenStax College. Located at : https://openstax.org/books/psychology-2e/pages/2-3-analyzing-findings . License : CC BY: Attribution . License Terms : Download for free at https://openstax.org/books/psychology-2e/pages/1-introduction.
- Experiments. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/reading-conducting-experiments/
- Research Review. Authored by : Jessica Traylor for Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/reading-conducting-experiments/
- Introduction to Statistics. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/outcome-statistical-thinking/
- histogram. Authored by : Fisher’s Iris flower data set. Provided by : Wikipedia.
- License : CC BY-SA: Attribution-ShareAlike Located at : https://en.wikipedia.org/wiki/Wikipedia:Meetup/DC/Statistics_Edit-a-thon#/media/File:Fisher_iris_versicolor_sepalwidth.svg .
- Statistical Thinking. Authored by : Beth Chance and Allan Rossman . Provided by : California Polytechnic State University, San Luis Obispo.
- License : CC BY-NC-SA: Attribution-NonCommerci al-S hareAlike . License Terms : http://nobaproject.com/license-agreement Located at : http://nobaproject.com/modules/statistical-thinking . Project : The Noba Project.
- Drawing Conclusions from Statistics. Authored by: Pat Carroll and Lumen Learning. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/reading-drawing-conclusions-from-statistics/
- Statistical Thinking. Authored by : Beth Chance and Allan Rossman, California Polytechnic State University, San Luis Obispo. Provided by : Noba. License: CC BY-NC-SA: Attribution-NonCommercial-ShareAlike Located at : http://nobaproject.com/modules/statistical-thinking .
- The Replication Crisis. Authored by : Colin Thomas William. Provided by : Ivy Tech Community College. License: CC BY: Attribution
- How to Read Research. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/how-to-read-research/
- What is a Scholarly Article? Kimbel Library First Year Experience Instructional Videos. 9. Authored by: Joshua Vossler, John Watts, and Tim Hodge. Provided by : Coastal Carolina University License : CC BY NC ND: Attribution-NonCommercial-NoDerivatives Located at : https://digitalcommons.coastal.edu/kimbel-library-instructional-videos/9/
- Putting It Together: Psychological Research. Provided by : Lumen Learning. License : CC BY: Attribution Located at: https://pressbooks.online.ucf.edu/lumenpsychology/chapter/putting-it-together-psychological-research/
- Research. Provided by : Lumen Learning. License : CC BY: Attribution Located at:
All rights reserved content
- Understanding Driver Distraction. Provided by : American Psychological Association. License : Other. License Terms: Standard YouTube License Located at : https://www.youtube.com/watch?v=XToWVxS_9lA&list=PLxf85IzktYWJ9MrXwt5GGX3W-16XgrwPW&index=9 .
- Correlation vs. Causality: Freakonomics Movie. License : Other. License Terms : Standard YouTube License Located at : https://www.youtube.com/watch?v=lbODqslc4Tg.
- Psychological Research – Crash Course Psychology #2. Authored by : Hank Green. Provided by : Crash Course. License : Other. License Terms : Standard YouTube License Located at : https://www.youtube.com/watch?v=hFV71QPvX2I .
Public domain content
- Researchers review documents. Authored by : National Cancer Institute. Provided by : Wikimedia. Located at : https://commons.wikimedia.org/wiki/File:Researchers_review_documents.jpg . License : Public Domain: No Known Copyright
grounded in objective, tangible evidence that can be observed time and time again, regardless of who is observing
well-developed set of ideas that propose an explanation for observed phenomena
(plural: hypotheses) tentative and testable statement about the relationship between two or more variables
an experiment must be replicable by another researcher
implies that a theory should enable us to make predictions about future events
able to be disproven by experimental results
implies that all data must be considered when evaluating a hypothesis
committee of administrators, scientists, and community members that reviews proposals for research involving human participants
process of informing a research participant about what to expect during an experiment, any risks involved, and the implications of the research, and then obtaining the person’s consent to participate
purposely misleading experiment participants in order to maintain the integrity of the experiment
when an experiment involved deception, participants are told complete and truthful information about the experiment at its conclusion
committee of administrators, scientists, veterinarians, and community members that reviews proposals for research involving non-human animals
research studies that do not test specific relationships between variables
research investigating the relationship between two or more variables
research method that uses hypothesis testing to make inferences about how one variable impacts and causes another
observation of behavior in its natural setting
inferring that the results for a sample apply to the larger population
when observations may be skewed to align with observer expectations
measure of agreement among observers on how they record and classify a particular event
observational research study focusing on one or a few people
list of questions to be answered by research participants—given as paper-and-pencil questionnaires, administered electronically, or conducted verbally—allowing researchers to collect data from a large number of people
subset of individuals selected from the larger population
overall group of individuals that the researchers are interested in
method of research using past records or data sets to answer various research questions, or to search for interesting patterns or relationships
studies in which the same group of individuals is surveyed or measured repeatedly over an extended period of time
compares multiple segments of a population at a single time
reduction in number of research participants as some drop out of the study over time
relationship between two or more variables; when two variables are correlated, one variable changes as the other does
number from -1 to +1, indicating the strength and direction of the relationship between variables, and usually represented by r
two variables change in the same direction, both becoming either larger or smaller
two variables change in different directions, with one becoming larger as the other becomes smaller; a negative correlation is not the same thing as no correlation
changes in one variable cause the changes in the other variable; can be determined only through an experimental research design
unanticipated outside factor that affects both variables of interest, often giving the false impression that changes in one variable causes changes in the other variable, when, in actuality, the outside factor causes changes in both variables
seeing relationships between two things when in reality no such relationship exists
tendency to ignore evidence that disproves ideas or beliefs
group designed to answer the research question; experimental manipulation is the only difference between the experimental and control groups, so any differences between the two are due to experimental manipulation rather than chance
serves as a basis for comparison and controls for chance factors that might influence the results of the study—by holding such factors constant across groups so that the experimental manipulation is the only difference between groups
description of what actions and operations will be used to measure the dependent variables and manipulate the independent variables
researcher expectations skew the results of the study
experiment in which the researcher knows which participants are in the experimental group and which are in the control group
experiment in which both the researchers and the participants are blind to group assignments
people's expectations or beliefs influencing or determining their experience in a given situation
variable that is influenced or controlled by the experimenter; in a sound experimental study, the independent variable is the only important difference between the experimental and control group
variable that the researcher measures to see how much effect the independent variable had
subjects of psychological research
subset of a larger population in which every member of the population has an equal chance of being selected
method of experimental group assignment in which all participants have an equal chance of being assigned to either group
consistency and reproducibility of a given result
accuracy of a given result in measuring what it is designed to measure
determines how likely any difference between experimental groups is due to chance
statistical probability that represents the likelihood that experimental results happened by chance
Psychological Science is the scientific study of mind, brain, and behavior. We will explore what it means to be human in this class. It has never been more important for us to understand what makes people tick, how to evaluate information critically, and the importance of history. Psychology can also help you in your future career; indeed, there are very little jobs out there with no human interaction!
Because psychology is a science, we analyze human behavior through the scientific method. There are several ways to investigate human phenomena, such as observation, experiments, and more. We will discuss the basics, pros and cons of each! We will also dig deeper into the important ethical guidelines that psychologists must follow in order to do research. Lastly, we will briefly introduce ourselves to statistics, the language of scientific research. While reading the content in these chapters, try to find examples of material that can fit with the themes of the course.
To get us started:
- The study of the mind moved away Introspection to reaction time studies as we learned more about empiricism
- Psychologists work in careers outside of the typical "clinician" role. We advise in human factors, education, policy, and more!
- While completing an observation study, psychologists will work to aggregate common themes to explain the behavior of the group (sample) as a whole. In doing so, we still allow for normal variation from the group!
- The IRB and IACUC are important in ensuring ethics are maintained for both human and animal subjects
Psychological Science: Understanding Human Behavior Copyright © by Karenna Malavanti is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Share This Book
- Open access
- Published: 04 December 2023
Research on the detection model of mental illness of online forum users based on convolutional network
- Yuliang Guo 1 ,
- Zheng Zhang 2 &
- Xuejun Xu 3
BMC Psychology volume 11 , Article number: 424 ( 2023 ) Cite this article
Metrics details
Recently, there will be more than 4.62 billion social media users worldwide. A large number of users tend to publish personal emotional dynamics or express opinions on social media. These massive user data provide data support for the development of mental illness detection research and have achieved good results. However, it is difficult for current mental illness detection models to accurately identify key emotional features from a large number of posts issued by users to detect problem users. In view of the fact that the existing models cannot more accurately extract the words with high emotional contribution in the content of user posts, this paper proposes two hierarchical user post feature representation models, named Single-Gated LeakReLU-CNN (SGL-CNN) and Multi-Gated LeakyReLU-CNN (MGL-CNN). We leverage these 2 models to identify users with mental illness in online forums. For all posts published by each user within a certain time span, the model proposed in this paper can identify key emotional features in them and filter out other unimportant information as much as possible. In addition, the addition of gating units in this paper can significantly improve the performance of emotion detection tasks. The experimental results based on the task of RSDD dataset prove that the performance of the model proposed in this paper is superior to that of the existing methods.
Peer Review reports
Introduction
The development of the times and social changes have made social competition more and more fierce, making people face various pressures in life, study, emotion and employment. When faced with stress, people's psychological state is prone to change, resulting in various psychological abnormalities. If psychological abnormalities are not adjusted in time, it will cause serious emotional disorders. Depression is one of the typical and common mood disorders. Depression has received much attention in the field of mental health. According to available research, approximately 280 million people suffer from depression worldwide, and 5.0% of adults suffer from depression [ 1 ]. Among many mental illnesses, depression is the most common one, the second most common disease that plagues global public health problems, and the main cause of the total global disease burden. After the COVID-19 epidemic, the global burden of mental disorders has become heavier. The cases of major depressive disorder and anxiety disorder have increased by 28% and 26% respectively, and the number of patients with depression has surged by 53 million, an increase of 27.6% [ 2 , 3 ]. According to the World Health Organization, the global incidence of depression is about 11%, and it is estimated that by 20,230, depression will become the main cause of disability worldwide [ 4 ]. Literature [ 5 ] conducted a cross-sectional epidemiological survey on the prevalence of mental disorders at 157 monitoring points in 31 provinces in China. The results showed that in the past 30 years, the lifetime prevalence of depression in China was 6.8%, the prevalence rate during the year was 3.6%. Suicidal ideation is one of the main symptoms in a depression diagnosis. About 2/3 of depressed patients have had suicidal ideation, and about 25% of depressed patients have had suicidal behavior. In China, 287,000 people die by suicide every year, 63% of the suicides have mental disorders, and 40% suffer from depression [ 6 ]. Depression not only causes great harm to oneself, but also imposes a heavy economic burden on society. The direct and indirect economic losses caused by depression in China are as high as 80 billion dollars per year [ 7 ].
At present, most researches on the detection of mental illness focus on depression. As social media continues to grow, more and more users suffering from mental illness are turning to online forums to express their mental health concerns and seek help and treatment. For example, online support forums like Reddit have many self-reported depression patients, and they provide a lot of negative emotional information for scholars to study [ 8 ]. Traditional mental illness detection usually requires patients to seek help from a psychologist on their own initiative. However, considering that the development of mental health services in remote and underdeveloped areas is not perfect and the cost of diagnosis is relatively high, many people cannot get timely diagnosis and miss the best treatment period. Therefore, automatic diagnosis and identification of depressed patients based on human emotions has become the key to the prevention and treatment of depression.
With the gradual popularity of social media and the continuous development of computer technology, the number of users joining social networks is on the rise. More and more netizens are inclined to express their opinions and emotional dynamics in social media, and more and more personal posting information can be collected on the Internet. Social media has gradually become an important way for the public to share the latest emotional information and discuss hot spots of public opinion and an effective channel for obtaining information. On social media, mental health issues have gradually become a hot topic in the field of health communication. How to dig out the potential medical value and useful information from these massive personal data, and at the same time devote enough attention and appropriate medical auxiliary treatment is a very challenging and socially significant research. The use of social media data for depression detection can maximize the convenience of big data, and can also more effectively identify some patients with potential mental illness problems. This group of people may not be diagnosed but still have a high risk of disease.
The reason why social media can attract a large number of users to use and express their true inner feelings is mainly reflected in two aspects. First, social media provides users with a relatively closed and safe environment. Due to the anonymity of social media, people are more willing to express their opinions in the online world. Because users can lose their defenses and burdens in an anonymous environment, the cost of speaking is significantly reduced. Second, due to the virtuality and invisibility of the Internet, users do not have to deliberately maintain their real images, and can express their true inner feelings even more carelessly. Therefore, using specific sentiment analysis algorithms to process feature words with strong negative semantics posted by users on social media, users with potential psychological problems can be discovered as early as possible. This can help those potential patients to intervene in advance, so as to avoid the aggravation of psychological problems and the occurrence of inactive medical treatment [ 9 ]. In addition, personal language has a very important impact on the disclosure of mental illness [ 10 ]. During psychotherapy, doctors can make a diagnosis through the patient's language, so as to quickly diagnose mental diseases including depression, anorexia, and autism. Therefore, it is very necessary to use language to explore deeper problems of mental health. As the main carrier of individual information, social media is more worthy of researchers to explore and mine rich user language information, so as to better assist in the diagnosis of some potential mental illness patients.
The research in this paper is mainly based on a large amount of user data generated in social media. The main purpose is to accurately identify users with mental illness in a large amount of social media data. The main limitation of existing methods for mental illness detection is that some detection models are mostly based on machine learning algorithms, which require a lot of effort in feature engineering. However, the deep learning-based model does not pay much attention to the improvement and optimization of the model, and it is difficult to accurately identify key emotional features from a large number of posts published by users to detect problem users. In addition, due to the long time span of users posting, existing models do not take into account the time correlation and dependence between posts. These problems make the existing detection models still have a lot of room for improvement in terms of performance.
This paper adopts a layered structure to simulate the posting process of users, introduces the concept of layering and gating weight into the field of mental disease detection, and proposes two layered neural network models for mental disease detection, named SGL-CNN respectively and MGL-CNN. User datasets from different social media all contain a certain number of posts. For all posts published by each user within a certain time span, the two models proposed in this paper can be used to identify the real key emotional features contained in them, and suppress other unimportant feature information as much as possible.
Related work
Sentiment analysis method based on social media data.
The general sentiment analysis flow chart is shown in Fig. 1 . The method based on the sentiment lexicon regards the sentiment words as an important basis for judging the sentiment polarity of the text. A large amount of manual experience and rules are needed to summarize and give certain weights to commonly used emotional words when constructing an emotional dictionary. Its general process is: first of all, it is necessary to identify the emotional words that can express the emotional tendency of the text, and a perfect emotional dictionary will be used here. Afterwards, the text is scored according to the corresponding algorithm to obtain its emotional tendency value.

Flow chart of text sentiment analysis based on sentiment lexicon
Sentiment dictionaries play a very important role in sentiment analysis tasks. Relying on a large number of emotional words in the sentiment lexicon to analyze text is the key to analyzing sentiment. Sentiment dictionaries can usually be constructed manually [ 11 ], or through heuristic algorithms [ 12 ] or related algorithm construction for machine learning. The advantage of the method based on emotional lexicon is that it can manually collect professional field dictionaries and write more accurate and better quality emotional lexicons. The disadvantage is that it consumes a lot of labor.
The overall process based on the machine learning method is shown in Fig. 2 . Traditional text-level sentiment analysis usually uses a combination of machine learning and feature engineering. Features are information extracted from data that is useful for outcome prediction. Sentiment classification methods based on machine learning mostly use classical classification models such as support vector machines, naive Bayesian, and maximum entropy models. The performance of most of the classification models depends on the quality of the labeled data set, and obtaining high-quality labeled data requires a lot of labor costs. Pang [ 7 ] et al. studied the effectiveness of applying traditional machine learning models (maximum entropy model, naive Bayesian and support vector machines) to sentiment classification problems and compared them with traditional topic models. Compared with the previous method, the method in [ 13 ] improves the accuracy rate by 10 percentage points. However, the method in [ 13 ] cannot express the structural features of documents, ignoring the semantic relationship between sentences. Secondly, the training and testing of the experiment are carried out on the same data set, and the trained model is highly dependent on the data set and does not have wide applicability. Turney [ 14 ] et al. extracted sentiment words from syntactic patterns, and trained machine learning models to identify the sentiment polarity of documents. Wang [ 15 ] et al. used n-grams of words as features, and chose Naive Bayes and Support Vector Machine (SVM) and its variants to implement sentiment analysis. In the paper, the authors propose a model variant of SVM using the log count ratio as the eigenvalue.

Flow chart of text sentiment analysis based on machine learning
Traditional sentiment analysis methods mainly rely on the design of feature functions (feature engineering), while deep learning allows a trainable neural network structure to represent a higher-level feature, which can greatly simplify the process of feature engineering. The process of text sentiment analysis based on deep learning is shown in Fig. 3 . Part of the text classification research of deep learning focuses on the research based on the word embedding model. The other part is building and optimizing neural networks or classifiers. Two representative network models are convolutional neural network and recurrent neural network based on long short-term memory (LTSM). The convolutional neural network aims to learn to extract the hierarchical structure of key text elements, which is simple and efficient and can achieve better accuracy. Kim [ 16 ] first applied the convolutional neural network to text classification, and proposed four related variant models to achieve pre-training and word embedding performance improvement. Gated Convolutional Neural Networks [ 17 ] first introduced gated units into CNN language modeling. This model provides a linear path for the gradient while maintaining the non-linear ability to reduce vanishing gradients. Yang [ 18 ] et al. proposed a novel text classification model based on gating mechanism. The model generates a variety of gate weights through convolution kernels of different sizes to control how much important information in the text is retained. Although CNNs were originally used in computer vision, they have been very successful in NLP tasks. Because they do not rely on time series, they are easily parallelized during training, saving training time considerably. The disadvantage of CNN is that it cannot guarantee the sequential characteristics of words and sentences. Long short-term memory neural network (LSTM) can effectively alleviate the problem of gradient explosion or gradient disappearance, and learn longer dependent information. Xu et al. [ 19 ] proposed CLSTM to capture emotional semantic information of longer sequences. It adds a caching mechanism to the LSTM, and divides the memory units of the hidden layer into several groups according to the different forgetting rates, so that the group with a high forgetting rate acts as a cache. Duyu Tang [ 20 ] used a neural network with a gate mechanism to model chapters. First, a convolutional neural network was used to model the chapters in a fine-grained manner, and then a neural network with a gate mechanism was used. Zichao Yang et al. [ 21 ] proposed a hierarchical attention network for document classification. The model has a two-level attention mechanism, which is suitable for sentences and documents, and can selectively focus on information-rich words and sentences.

Flow chart of text sentiment analysis based on deep learning
Mental illness detection method based on social media data
Self-expression and social support can help improve the mental health of people with mental illness. In addition, the language users use in social media can reveal their true inner thoughts. Natural language is related to human personality, psychological state and situational fluctuations. Therefore, how to identify the language style of individuals with a tendency to depression is particularly important.
More and more depressed patients turn to online media resources (Twitter, Weibo, Reddit, etc.) to express their psychological problems and seek help [ 22 ]. Many users especially gravitate toward online web forums where they can choose to remain anonymous or remain a guest. Early detection of depression using social media data has become an effective means.
Sentiment analysis detection tasks related to mental health are similar to traditional sentiment analysis classification tasks, and some of these studies mainly use traditional machine learning methods. For example, Schwartz et al. used Facebook data to build a regression model to predict individual depression levels from multiple granularities [ 23 ]. Thompson et al. constructed a mental illness detection model based on a random forest classifier and a bag-of-words model using patient clinical records and online social media data [ 24 ]. They used the model to examine suicide risk and mental health in service members and veterans. Moreno et al. [ 25 ] used a large amount of data collected from Facebook to conduct learning and analysis of depression detection algorithms, and finally refer to the symptoms of depression patients to identify depressed users.
In addition to the good results achieved by machine learning in the exploration of depression detection, many deep learning methods have also achieved good results in text classification and sentiment analysis. These deep learning methods only rely on the content of the text itself, not on any other external features. Gui et al. [ 26 ] proposed a novel collaborative multi-agent model to detect depressed groups in the Twitter dataset. The model includes a text feature extractor and image feature extractor. The text feature extractor employs a gated recurrent unit and a convolutional neural network to extract the sentiment features of the dataset. Yang et al. [ 27 ] built two effective target-related emotion classification models using bidirectional long-term and short-term neural networks. In addition to long-short-term neural networks and recurrent neural networks, convolutional neural networks (CNNs) are also actively used in text classification in the medical field. A general depression detection model based on convolutional neural networks was proposed by Nemeth et al. They used it to combine users' posts with user language, posting characteristics to assess users' depression and self-harm risks [ 28 ]. Subsequently, Cong et al. proposed a deep learning-based method to solve the depression dataset RSDD with imbalanced positive and negative data volume [ 29 ]. Liang et al. introduced an emotional feature extraction model based on graph convolutional neural network [ 30 ]. The model can selectively output sentiment features according to a given aspect or entity.
Mental illness detection model based on gating weights and convolutional networks
This paper introduces two novel hierarchical mental illness detection models to identify mental illness patients in web forums, which are named MGL-CNN and SGL-CNN. Since the user's overall data consists of a list of posts, and each post is composed of a list of words, our model can be split into 2 parts. One part is the post feature representation layer, and the other part is the user's overall activity representation layer. The basic principles of the two models proposed in this paper are as follows. The model first produces a continuous post feature representation from the word representations in the post. The model then takes the post feature representation as input to the second part to obtain the overall emotional state representation of the user. Finally, the model outputs the user's overall emotional activity representation as a feature output for mental illness classification. The overall block diagram of the mental illness detection model described in this chapter is shown in Fig. 4 . The two models of MGL-CNN and SGL-CNN have many similarities, and the difference is mainly in the number of gating units.

The overall block diagram of the mental illness detection model proposed in this paper
Previous natural language processing methods require more training time and computational overhead because most of them adopt long-short-term memory and attention mechanism models to predict emotional polarity. The model proposed in this paper can solve this problem well. First, we replace the idiomatic recurrent connections in recurrent networks with gated temporal convolutions. Second, we use a special convolutional encoder to convolve the input and obtain gating weights independently. The advantage of this is that patients with a tendency to mental illness can be more accurately identified. Compared with previous methods, the method proposed in this paper has no time dependence, and can easily perform parallel operations in user documents, thereby improving computational efficiency.
In the model proposed in this paper, the structure of the user activity representation layer is the same as that of the post feature representation layer. The input of the model can be passed by the multi-layer convolutional neural network of the gating unit, so that the limited context information can be fully utilized to obtain the key features of the post representation with maximum efficiency. Figure 5 depicts the specific details of the post feature representation layer of SGL-CNN. The post feature representation layer of MGL-CNN is shown in Fig. 6 . It is worth noting that both SGL-CNN and MGL-CNN are composed of 2 convolutional layers and a global average pooling layer. The main difference between them is the number of gating weights generated. We briefly explain here, taking the post representation layer in MGL-CNN as an example, its first convolutional layer first obtains the abstract feature map. The acquisition process of this feature map uses two convolution kernels of different sizes. Next, different gating weights are obtained by configuring the second convolutional layer of the gating unit. Then, the abstract feature map derived by the first convolutional layer is multiplied element-wise by the gating weights derived by the second convolutional layer to obtain the feature representation of the post.

The specific block diagram of the post feature representation layer in the MGL-CNN model

The specific block diagram of the post feature representation layer in the SGL-CNN model
For ease of understanding, we next mainly describe the process of extracting a feature from a filter. Each word is represented by a vector stored in the word embedding matrix. We denote each post by a user as {w1, w2,…,wi,…wn}.We let xiϵRd be the d-dimensional word vector corresponding to the i-th word in the post. A post vector consisting of n words can be formulated as follows:
The representation of a single post in the first convolutional layer can be done by a CNN and multiple convolutional filters of different widths, for which we refer to [ 29 ]. The advantage of this is that convolution filters of different widths can be considered as feature extractors. In this way, multi-granularity local information can be obtained, such as N-Grams. As an example, a convolutional filter of width 2 can actually capture the semantics of Bi-Grams in user posts. Multiple feature maps can be obtained by multiple convolution filters of different sizes. In order to generate a new feature, we can apply a convolution kernel KϵRs with a stride of 1 and size s to a window of s words. We use Xi:i + s-1 to represent a concatenation of word vectors under a fixed window of s, through which we can generate a new abstract feature αi.
where bϵR is a bias term, * represents a convolution operation, and f is an activation function. In this section, we use LeakyReLU. We apply it to posts, and we can generate a feature map A, which is expressed as follows:
where AϵR (n−s+1)×1 . Then, each feature map produced by filters of different sizes is output to the second convolutional layer. The second convolution layer includes a convolution kernel and a gating unit. The goal of the second convolutional layer is to derive differentiated gating weights to better extract feature information from the first convolutional layer. We make the convolution kernel FϵR h×1 , which is used to obtain the context feature A. The convolution kernel F acts on the feature al to obtain the gating weight.
The gating weights generated by the convolution of the feature map A and the convolution kernel F can further generate a gating weight matrix.
Assuming that the number of convolution kernels in the second convolutional layer is m, we can extract different gating weight matrices from the gating unit of MGL-CNN. Then, the output feature map O can be further obtained through the gating weight matrix, which can be expressed as:
where ⊗ means that the elements between the matrices are multiplied.
The output O in the process of modeling from words to sentences is conditioned by gating weights G. These gating weights are similar to the attention weights learned in the attention mechanism, which can more accurately assign different importance to each word, which is conducive to improving the performance of the model. These gating weights are multiplied by the feature map A to control which information should be propagated through these layers.
To obtain the global information of a post, we feed the output of the second convolutional layer into a global average pooling layer, and concatenate all output features to obtain the final representation of a single post. To compute a user's overall psychologically acquired representation, the preordered post representations are fed into the user activity representation layer. Subsequently, the user's features are passed to a fully connected softmax layer whose output is a probability distribution over the labels. The loss function of this model uses the cross entropy loss function. Assuming that the target sentiment distribution of each document can be represented by pT, then the loss C value can be calculated as:
In formula ( 7 ), T represents the data used for training, and C represents the number of experimental categories.
Experimental results and analysis
This section mainly conducts experimental tests on the two mental disease detection models of MGL-CNN and SGL-CNN mentioned above to verify their effectiveness and accuracy. This chapter mainly selects three large-scale data sets about mental illness in social media. One is the Reddit Self-Reported Depression Diagnosis (RSDD) dataset, the other is the Early Detection Dataset of Depression (eRisk2017), and the third is the Anorexia Dataset (eRisk2018).
Experimental dataset
The latest large-scale depression detection dataset, the Reddi Self-Reported Depression Diagnosis (RSDD) dataset contains more than 9,000 users diagnosed with depression, and about 107,000 control users with healthy mental status. Non-depressed users were selected by matching candidate non-depressed users with diagnosed users. Data related to mental illness status in social media data usually has strong privacy and sensitivity. Therefore, the user's personal data risks and privacy issues must be considered when obtaining data. The RSDD dataset used in this paper only includes Reddit posts that users actively publish.
The authority of this data set is mainly reflected in the user selection. The users marked as depressed mainly consist of the following two points. The first point is that the patient wrote some self-diagnosed depression sentences many times in the post. Use these high-precision sensitive sentences to determine the diagnostic user group. The second is that some users who do not meet the data set construction rules will be excluded. These rules mainly include that the number of user posts is less than 100, and the number of depression-related characters mentioned in a single post does not exceed 80. Table 1 lists the specific user statistics of the RSDD dataset.
The early detection data set of depression (eRisk2017) is mainly used to carry out the early risk detection task of depression. The relevant data statistics of this data set are shown in Table 2 .
An early detection dataset for anorexia (eRisk2018) is similar to an early detection dataset for depression. It was mainly used for exploratory tasks on early risk detection of anorexia. The dataset has a small number of users, mainly consisting of 61 anorexia patients and 411 mental health users. Table 3 shows the statistics of the eRisk2018 anorexia dataset training set and test set. Although there are not many users in this data set, each user has a long history of posting (the average number of posts per user exceeds 300, and the number of words contained in each post exceeds 20), so a single user The amount of data is huge.
Experimental evaluation method
The detection and analysis of mental health problems is essentially a text classification or multi-classification problem, so this paper uses precision (Precision), recall (Recall), and comprehensive evaluation indicators (F-Measure) to evaluate the results of emotional detection. The precision rate refers to the ratio of samples predicted as positive emotions (negative emotions) to real positive emotion samples in the results of emotion prediction. The precision calculation formula is as follows:
Recall refers to the ratio of positive samples in a sample that are correctly estimated. There will also be two situations, one is to predict the positive emotional samples in the original sample as positive emotional samples. This is also called TP (True Positive). The other is to predict the positive emotional samples in the original sample as negative emotional samples. This is also called FP (False Negative). The formula for calculating the recall rate is as follows:
The value ranges of precision and recall are both in [0,1]. Generally speaking, the higher the precision and recall are required to judge the performance of the model in this paper, the better. However, in actual situations, there may be other contradictions between P and R indicators, so F-Measure came into being as a comprehensive evaluation indicator for weighted balance adjustment of precision and recall. At this time, there will be two situations. The first one is to correctly predict the positive emotion sample (negative emotion) as the sample of the positive emotion sample (negative emotion). This is also called TP (True Positive). The other is to predict positive emotion samples (negative emotion) as negative emotion samples (positive emotion) samples. This is also called FP (False Positive). The comprehensive evaluation index (F-Measure) can be calculated as follows:
Experiment settings
The experiments in this paper use RSDD, eRisk2017 and eRisk2018 datasets. The introduction of RSDD and eRisk2017 dataset will not be repeated here. In particular, the training set of the eRisk2018 anorexia dataset contains 20 anorexic users and 132 control users, while the test set contains 41 anorexic users and 279 control users. Table 4 shows the statistics of the training dataset after word segmentation. The symbols that need to be marked are c, l, M and N. c represents the number of target categories, l represents the average length of posts, M represents the average number of posts by users, and N represents the size of the dataset. The size of the vocabulary is represented by |V|.
Table 5 lists the values of the hyperparameters used in the experiments in this chapter. This paper tries to gradually increase the maximum number of posts starting from 400 posts, but the performance of the verification dataset has not been significantly improved. So, we finally chose 400, 350, 350 as the maximum number of posts for the 3 datasets.
Results and analysis
This sub-section mainly makes a comparative analysis of the final detection effect of the model proposed in this paper and the detection effect of the comparison model. Table 6 and Fig. 7 compare the performance of the 2 models proposed in this paper and other comparative models on the RSDD dataset. The results showed that the difference between the two models proposed in this paper and the reference model was statistically significant. We first compare with MNB and SVM classifiers using sparse and rich features. The results show that although MNB and SVM can achieve better results in accuracy, they perform poorly in terms of Recall and F1 indicators compared with CNN and LSTM-based models. For example, Feature-rich-MN and Feature-rich-SVM can reach 0.69 and 0.71 in terms of accuracy, but only 0.32 and 0.31 in Recall. In the mental disease detection task, if the Recall is higher, it means that more potential sick users can be detected, so as to avoid the problem of missed diagnosis. Therefore, for the special task of mental illness detection, more attention should be paid to Recall and F1.

Comparison of graphical experimental results between the model proposed in this paper and the reference model on the eRisk2017 dataset
Experimental results also show that the attention mechanism can help LSTM achieve good results in detection tasks. Compared with previous work, the proposed SCL-CNN model outperforms user-model-CNN in both Recall and F1. Table 7 and Fig. 8 compare the experimental results of the model proposed in this paper and the reference model on the early depression detection dataset. The results show that the performance of the model in this paper is close to the most advanced methods in terms of precision, recall and F1. Table 8 shows the experimental results of the proposed model and other state-of-the-art methods on the anorexia detection dataset. From the data value of the reference model, it can be known that the method with the highest accuracy rate is UNSLD, which achieves an effect of 0.91. But its recall is the worst, only 0.71. Compared with other reference models, the MGL-CNN model proposed in this paper has more balanced results on the three indicators, among which the F1 value is the highest, reaching 0.85. Compared with FHDO-BCSGE, which has the best accuracy index in the reference model, the work in this paper has a lower precision and F1 value, but the recall rate is 2.4% higher. In addition, compared with FHDO-BCSGE, the recall rate of MGL-CNN is also increased by 0.12%, and the F1 value is the same as it. The improvement of the recall rate means that more sick users can be detected in the anorexia detection task, which reduces the detection omission rate, and is more conducive to adjuvant treatment and providing help for more sick users.

Comparison of graphical experimental results between the model proposed in this paper and the reference model on the RSDD dataset
With the continuous development of social media, there are now many studies on detection models for mental illnesses such as depression and anorexia. Mental illness is a broad public health topic. It is related to everyone's quality of life and family happiness index, as well as social and economic development and harmony. In the face of a large amount of user data, how to effectively mine the valuable content and implement the research on mental disease detection and improve its practicability is a big challenge. In this paper, two detection methods based on a hierarchical emotion detection model are proposed to identify patients with mental illness. Compared with the previous methods, the mental illness detection model proposed in this paper is more accurate and effective. The model proposed in this paper can effectively represent the user's overall emotional state by encoding the user's posts. In the experimental part, we validate our model on the large-scale RSDD and esk2017 depression detection dataset, erisk2018 anorexia detection dataset. Its results show that our proposed model significantly outperforms the existing best methods in terms of accuracy, recall and F1. In the future, we will try to conduct research on mental illness detection based on multi-modal data. For example, we effectively integrate information in three data formats: image, voice and text to better identify sick users.
Availability of data and materials
The experimental data used to support the findings of this study are available from the corresponding author upon request.
Herrman H, Patel V, Kieling C, Berk M, Buchweitz C, et al. Time for united action on depression: a Lancet-World Psychiatric Association Commission. Lancet. 2022;399(10328):957–1022.
Article PubMed Google Scholar
Daly M, Robinson E. Depression and anxiety during COVID-19. Lancet. 2022;399(10324):518.
Article PubMed PubMed Central Google Scholar
Moreno C, Wykes T, Galderisi S, Nordentoft M, Crossley N, et al. How mental health care should change as a consequence of the COVID-19 pandemic. Lancet Psychiatry. 2020;7(9):813–24.
Mathers C. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006;3(11):1–20.
Article Google Scholar
Huang Y, Wang Y, Wang H, Liu Z, Yu X, et al. Prevalence of mental disorders in China: a cross-sectional epidemiological study. Lancet Psychiatry. 2019;6(3):211–24.
Y. Feng, L. Xiao, W. Wang, G. Ungvari, C. Ng, G. Wang, and Y. Xiang, "Guidelines for the diagnosis and treatment of depressive disorders in China: the second edition," Journal of Affective Disorders, 2019, 253: 352–356.
Ren X, Yu S, Dong W, Xin P, Xu X, Zhou M. Burden of depression in China, 1990–2017: findings from the global burden of disease study 2017. J Affect Disord. 2020;268:95–101.
Tadesse M, Lin H, Xu B, Yang L. Detection of depression-related posts in reddit social media forum. IEEE Access. 2019;7:44883–93.
Eichstaedt J, Smith R, Merchant R. Facebook language predicts depression in medical records. Proc Natl Acad Sci. 2018;115(44):11203–8.
Richards V. The importance of language in mental health care. Lancet Psychiatry. 2018;5(6):460–1.
Das S, Chen M. Yahoo!for amazon: sentiment extraction from small talk on the web. Manage Sci. 2007;53(9):1375–88.
S. Kim and E. Hovy, "Determining the sentiment of opinions," Proceedings of the 20th International Conference on Computational Linguistics. 2004: 1367–1373.
Birjali M, Kasri M, Hssane A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl-Based Syst. 2021;226:1–10.
Turney P, Littman M. Measuring praise and criticism: inference of semantic orientation from association. ACM Transactions on Information Systems. 2003;21(4):315–46.
S. Wang and C. Manning, "Baselines and bigrams: Simple, good sentiment and topic classification," Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2012: 90–94.
Y. Kim, "Convolutional neural networks for sentence classification," In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014: 1746–1751.
Y. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," International conference on machine learning. PMLR, 2017: 933–941.
Liu Y, Ji L, Huang R, Ming T, Gao C, Zhang J. An attention-gated convolutional neural network for sentence classification. Intelligent Data Analysis. 2019;23(5):1091–107.
J. Xu, D. Chen, X. Qiu, and X. Huang, "Cached long short-term memory neural networks for document-level sentiment classification," In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 1660–1669.
D. Tang, B. Qin, and T. Liu, "Document modeling with gated recurrent neural network for sentiment classification," In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 1422–1432.
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, "Hierarchical attention networks for document classification," In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 2016: 1480–1489.
Malhotra A, Jindal R. Deep learning techniques for suicide and depression detection from online social media: A scoping review. Appl Soft Comput. 2022;130:1–12.
Chancellor S, Choudhury M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ digital medicine. 2020;3(1):1–11.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Moreno M, Jelenchick L, Egan K, Cox E, Young H, Gannon K, Becker T. Feeling bad on Facebook: Depres- sion disclosures by college students on a social networking site. Depress Anxiety. 2011;28(6):447–55.
Gui T, Zhu L, Zhang Q, Peng M, Zhou X, Ding K, Chen Z. Cooperative Multimodal Approach to Depression Detection in Twitter. Proc AAAI Conf Artif Intell. 2019;33(01):110–7.
Google Scholar
Yang M, Tu W, Wang J, Xu F, Chen X. Attention based LSTM for target dependent sentiment classification. Proc AAAI Conf Artif Intell. 2017;31(1):1–2.
Nemeth R, Sik D, Mate F. Machine learning of concepts hard even for humans: The case of online depression forums. Int J Qual Methods. 2020;19:1–8.
Q. Cong, Z. Feng, F. Li, Y. Xiang, G. Rao, C. Tao, "XA-BiLSTM: A deep learning approach for depression detection in imbalanced data," 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, 2018: 1624-1627.
Liang B, Su H, Gui L, Cambria E, Xu R. Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl-Based Syst. 2022;235:1–11.
Download references
Acknowledgements
Not applicable.
There is no specific funding to support this research.
Author information
Authors and affiliations.
Las Positas College Dublin High School, Livermore, 94551, USA
Yuliang Guo
School of Computer and Software, Nanyang Institute of Technology, Nanyang, 473004, China
Zheng Zhang
College of Education, Hubei University Wuhan University of Arts and Science, Wuhan, 430062, China
You can also search for this author in PubMed Google Scholar
Contributions
Yuliang Guo: Conceptualization, Methodology. Zheng Zhang: Data curation, Writing- Original draft preparation. Xuejun Xu: Visualization, Investigation. Xuejun Xu:Supervision. Yuancai Zhang: Software, Validation. Zheng Zhang、Yuliang Guo: Writing- Reviewing and Editing.
Corresponding author
Correspondence to Yuliang Guo .
Ethics declarations
Ethics approval and consent to participate.
The present study was approved by the ethics committee of Nanjing Medical University and carried out according to the declaration of Helsinki (World Medical Association, 2013). we received written informed consent from all participants.
Consent for publication
Competing interests.
The authors declare no competing interests.
Additional information
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and Permissions
About this article
Cite this article.
Guo, Y., Zhang, Z. & Xu, X. Research on the detection model of mental illness of online forum users based on convolutional network. BMC Psychol 11 , 424 (2023). https://doi.org/10.1186/s40359-023-01460-4
Download citation
Received : 24 July 2023
Accepted : 21 November 2023
Published : 04 December 2023
DOI : https://doi.org/10.1186/s40359-023-01460-4
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Social media
- Mental Illness
- Depression detection
- Neural network
- Feature representation
BMC Psychology
ISSN: 2050-7283
- Submission enquiries: [email protected]
- General enquiries: [email protected]

IMAGES
VIDEO
COMMENTS
Pilot Study Paper 2 - Psychology in Context | Research Methods | 30 Minutes Reliability Paper 2 - Psychology in Context | Research Methods | 30 Minutes Features of Science, Objectivity And The Empirical Method Paper 2 - Psychology in Context | Research Methods | 30 Minutes Psychological Report Writing
PSYCHOLOGY 9990/02 Paper 2 Research Methods For examination from 2024 MARK SCHEME Maximum Mark: 60 Specimen. 9990/02 Cambridge International AS A Level - Mark Scheme For examination SPECIMEN from 2024 Page 2 of 21 Generic Marking Principles These general marking principles must be applied by all examiners when marking candidate answers. They ...
PSYCHOLOGY 9990/02 Paper 2 Research Methods For examination from 2020 SPECIMEN PAPER 1 hour 30 minutes You must answer on the question paper. No additional materials are needed. INSTRUCTIONS Answer all questions. Use a black or dark blue pen. You may use an HB pencil for any diagrams or graphs.
Paper 2: Psychology in Context Paper 3: Issues and Options November 2021 (Labelled as June 2021) Download Past Paper: A-Level (7182) Download Mark Scheme: A-Level (7182) November 2020 (Labelled as June 2020) Download Past Paper: A-Level (7182) Download Past Paper: AS (7181) Download Mark Scheme: A-Level (7182) Download Mark Scheme: AS (7181)
Paper 2 Psychology in context . 2 *02* IB/G/Jun21/7182/2. Do not write outside the . Section A . box . Approaches in psychology . Answer all questions in this section. 0 Outline the way in 1 ... Research methods . Answer all questions in this section. Only one answer per question is allowed.
AQA A-Level Psychology: Paper 2 - Research Methods emily froud 5.4K subscribers Subscribe 1.2K 61K views 4 years ago AQA A-Level Psychology Hope you guys enjoy this video Topics not...
Research Methods revision for AQA A level psychology paper 2
Here are some example answers to the written Paper 2 questions on Research Methods in the 2019 AQA exams. Grade Booster exam workshops for 2024 . Join us in to Birmingham, Bristol, Leeds, London, Manchester and Newcastle Book now →
Section C - Research Methods: Q12 [1 Mark] C = 27%. Section C - Research Methods: Q13 [3 Marks] Pilot studies are small-scale prototypes of a study that are carried out in advance of the full research to find out if there are any problems with the methodology. This helps to ensure that time, effort and money are not wasted on a flawed ...
Example Answers for Research Methods: A Level Psychology, Paper 2, June 2019 (AQA) Exam Support Research Methods: MCQ Revision Test 2 for AQA A Level Psychology Quizzes & Activities Research Methods: MCQ Revision Test 1 for AQA A Level Psychology Topic Videos Example Answer for Question 19 Paper 2: AS Psychology, June 2017 (AQA) Exam Support
ALevel Psychology Paper 2 Research Methods 1 Past Questions and Mark Scheme Name: Class: ... The psychology student's teacher identified a number of limitations of the proposed experiment. Explain one or more limitations of the student's proposal and suggest how the investigation
Features of science***. Design an experiment. -Independent groups-Repeated measures-Matched pairs. Peer review. Ethics***. -Informed consent-Deception-Protection from harm-Confidentiality-Right to withdraw-Debrief. Canada (French) Study with Quizlet and memorize flashcards containing terms like Type 1 error**, Type 2 error**, Reliability and more.
PSYCHOLOGY Paper 2 Research Methods Candidates answer on the Question Paper. Additional Materials: Ruler 9990/23 May/June 2019 1 hour 30 minutes READ THESE INSTRUCTIONS FIRST Write your centre number, candidate number and name in the spaces at the top of this page. Write in dark blue or black pen.
Practice papers with A* answers book #2 contains five full exam papers with five research methods sections. These questions cover the different types of study, how findings are reported, and have several worked examples of those tricky maths questions!
Paper 2 Approaches Biopsychology Research Methods Paper 3: Compulsory Issues and Debates Option 1 Relationships Gender Cognitive Development Option 2 Schizophrenia Eating Behaviour Stress Option 3 Aggression Forensic Psychology Addiction past paper questions Help us complete the revision notes.
Terms in this set (54) Internal Reliability. measure of the extent to which something is consistent within itself e.g all questions on psychology test should measure the same thing. Split-Half method of reliability. measuring tool is divided in half, and the results from each half are compared. External Reliability.
In this 1 hour and 45 mins video I go through the Paper 2 for the 2021 paper for Cambridge A-Levels Psychology.I was very tired and struggling to stay awak...
Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena. Hypotheses
Two pt groups complete two different experimental conditions. + Order effects - less likely to guess the aim. - Pt variables - don't know if difference is due to IV or pt variables (CV) so decreased validity. - Less economical - increased money and time spent on recruitment.
Start studying Revision: Psychology Paper 2: Research Methods.. Learn vocabulary, terms, and more with flashcards, games, and other study tools.
There are 5 methods of research in psychology. Each of these methods are independent and can be used by students while writing their school papers. The methods are: Survey Method. This research method is widely used in the sciences and not just in psychology. However, it's a standard psychology research method due to its effectiveness in ...
Psychologists focus their attention on understanding behavior, as well as the cognitive (mental) and physiological (body) processes that underlie behavior. In contrast to other methods that people use to understand the behavior of others, such as intuition and personal experience, the hallmark of scientific research is that there is evidence to ...
The research in this paper is mainly based on a large amount of user data generated in social media. The main purpose is to accurately identify users with mental illness in a large amount of social media data. ... Compared with previous methods, the method proposed in this paper has no time dependence, and can easily perform parallel operations ...