Saturday, January 18, 2020

From 2011... False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

From 2011... False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Joseph P. Simmons, Leif D. Nelson, Uri Simonsohn. Psychological Science, October, 2011. https://doi.org/10.1177/0956797611417632

Abstract: In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

Keywords: methodology, motivated reasoning, publication, disclosure

---
In this article, we show that despite the nominal endorsement of a maximum false-positive rate of 5% (i.e., p ≤ .05), current standards for disclosing details of data collection and analyses make false positives vastly more likely. In fact, it is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis.

The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?

It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at least one (of many) analyses producing a falsely positive finding at the 5% level is necessarily greater than 5%.

This exploratory behavior is not the by-product of malicious intent, but rather the result of two factors: (a) ambiguity in how best to make these decisions and (b) the researcher’s desire to find a statistically significant result. A large literature documents that people are self-serving in their interpretation of ambiguous information and remarkably adept at reaching justifiable conclusions that mesh with their desires (Babcock & Loewenstein, 1997; Dawson, Gilovich, & Regan, 2002; Gilovich, 1983; Hastorf & Cantril, 1954; Kunda, 1990; Zuckerman, 1979). This literature suggests that when we as researchers face ambiguous analytic decisions, we will tend to conclude, with convincing self-justification, that the appropriate decisions are those that result in statistical significance (p ≤ .05).

Ambiguity is rampant in empirical research. As an example, consider a very simple decision faced by researchers analyzing reaction times: how to treat outliers. In a perusal of roughly 30 Psychological Science articles, we discovered considerable inconsistency in, and hence considerable ambiguity about, this decision. Most (but not all) researchers excluded some responses for being too fast, but what constituted “too fast” varied enormously: the fastest 2.5%, or faster than 2 standard deviations from the mean, or faster than 100 or 150 or 200 or 300 ms. Similarly, what constituted “too slow” varied enormously: the slowest 2.5% or 10%, or 2 or 2.5 or 3 standard deviations slower than the mean, or 1.5 standard deviations slower from that condition’s mean, or slower than 1,000 or 1,200 or 1,500 or 2,000 or 3,000 or 5,000 ms. None of these decisions is necessarily incorrect, but that fact makes any of them justifiable and hence potential fodder for self-serving justifications.

From 2015... Estimating the reproducibility of psychological science: Innovation points out paths that are possible; replication points out paths that are likely; progress relies on both

From 2015... Estimating the reproducibility of psychological science. "Open Science Collaboration." Science, Vol. 349, Issue 6251, aac4716. Aug 28 2015, http://dx.doi.org/10.1126/science.aac4716

Empirically analyzing empirical evidence: One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study.

Structured Abstract
INTRODUCTION Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. Scientific claims should not gain credence because of the status or authority of their originator but by the replicability of their supporting evidence. Even research of exemplary quality may have irreproducible empirical findings because of random or systematic error.

RATIONALE There is concern about the rate and predictors of reproducibility, but limited evidence. Potentially problematic practices include selective reporting, selective analysis, and insufficient specification of the conditions necessary or sufficient to obtain the results. Direct replication is the attempt to recreate the conditions believed sufficient for obtaining a previously observed finding and is the means of establishing reproducibility of a finding with new data. We conducted a large-scale, collaborative effort to obtain an initial estimate of the reproducibility of psychological science.

RESULTS We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. There is no single standard for evaluating replication success. Here, we evaluated reproducibility using significance and P values, effect sizes, subjective assessments of replication teams, and meta-analysis of effect sizes. The mean effect size (r) of the replication effects (Mr = 0.197, SD = 0.257) was half the magnitude of the mean effect size of the original effects (Mr = 0.403, SD = 0.188), representing a substantial decline. Ninety-seven percent of original studies had significant results (P < .05). Thirty-six percent of replications had significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.

CONCLUSION No single indicator sufficiently describes replication success, and the five indicators examined here are not the only ways to evaluate reproducibility. Nonetheless, collectively these results offer a clear conclusion: A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes. Moreover, correlational evidence is consistent with the conclusion that variation in the strength of initial evidence (such as original P value) was more predictive of replication success than variation in the characteristics of the teams conducting the research (such as experience and expertise). The latter factors certainly can influence replication success, but they did not appear to do so here.

Reproducibility is not well understood because the incentives for individual scientists prioritize novelty over replication. Innovation is the engine of discovery and is vital for a productive, effective scientific enterprise. However, innovative ideas become old news fast. Journal reviewers and editors may dismiss a new test of a published idea as unoriginal. The claim that “we already know this” belies the uncertainty of scientific evidence. Innovation points out paths that are possible; replication points out paths that are likely; progress relies on both. Replication can increase certainty when findings are reproduced and promote innovation when they are not. This project provides accumulating evidence for many findings in psychological research and suggests that there is still more work to do to verify whether we know what we think we know.


Why not release honest statements for research fields that are messy, inconsistent, have systematic methodological weaknesses or that may be outright unreproducible? https://www.bipartisanalliance.com/2019/02/why-not-release-honest-statements-for.html
Copenhaver, A. & Ferguson, C.J. (in press). Selling violent video game solutions: A look inside the APA’s internal notes leading to the creation of the APA’s 2005 resolution on violence in video games and interactive media. International Journal of Law and Psychiatry.
Ferguson, C.J. (2015). ‘Everybody knows psychology is not a real science’: Public perceptions of psychology and how we can improve our relationship with policymakers, the scientific community, and the general public. American Psychologist, 70, 527–542.
Fiske, S. (2016). Mob rule or wisdom of crowds [Draft of article for APS Observer]. Available at http://datacolada.org/wp-content/uploads/2016/09/Fiske-presidential-guest-column_APS-Observer_copy-edited.pdf
Gilbert, D.T., King, G., Pettigrew, S. & Wilson, T.D. (2016). Comment on ‘Estimating the reproducibility of psychological science’. Science, 351(6277), 1037.
Nosek, B.A., Ebersole, C.R., DeHaven, A.C. & Mellor, D.T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2600–2606.
Nelson, L.D., Simmons, J. & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69, 511–534.
Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://www.bipartisanalliance.com/2020/01/from-2011-false-positive-psychology.html
Weir, K. (2014). Translating psychological science. APA Monitor, 45(9), 32. Available at www.apa.org/monitor/2014/10/translating-science.aspx

The Effects of Militarized Interstate Disputes on Incumbent Voting Across Genders

The Effects of Militarized Interstate Disputes on Incumbent Voting Across Genders. Shane P. Singh, Jaroslav Tir. Political Behavior, December 2019, Volume 41, Issue 4, pp 975–999. https://link.springer.com/article/10.1007/s11109-018-9479-z

Abstract: Gender and politics research argues that men are more hawkish and supportive of militarized confrontations with foreign foes, while women ostensibly prefer more diplomatic approaches. This suggests that, after a militarized confrontation with a foreign power, women’s likelihood of voting for the incumbent will both decrease and be lower than that of men. Our individual-level, cross-national examinations cover 87 elections in 40 countries, 1996–2011, and show only some support for such notions. Women punish incumbents when their country is targeted in a low-hostility militarized interstate dispute (MID) or when their country is the initiator of a high-hostility MID. The low-hostility MID initiation and high-hostility MID targeting scenarios, meanwhile, prompt women to be more likely to vote for the incumbent. Importantly, men’s reactions rarely differ from women’s, casting doubt on the existence of a gender gap in electoral responses to international conflict.

Keywords: Voting behavior Gender Conflict Diversion Rally

Replication code and data for this paper are available in the Political Behavior Dataverse at:  https://doi.org/10.7910/DVN/O9UVFU

Young adults who expect to do worse than their parents in the future are indeed more likely to locate themselves at the extreme ends of the ideological scale, and most of them are in the Left

Extreme Pessimists? Expected Socioeconomic Downward Mobility and the Political Attitudes of Young Adults. Elena Cristina Mitrea, Monika Mühlböck,  Julia Warmuth. Political Behavior, January 18 2020. https://link.springer.com/article/10.1007/s11109-020-09593-7

Abstract: In recent decades, and especially since the economic crisis, young people have been finding it more difficult to maintain or exceed the living standards of their parents. As a result, they increasingly expect socioeconomic downward mobility. We study the influence of such a pessimistic view on political attitudes, assuming that it is not so much young adults’ current economic status, but rather their anxiety concerning a prospective socioeconomic decline that affects their ideological positions. Drawing on data from a survey among young adults aged 18–35 in eleven European countries, we explore to what extent expected intergenerational downward mobility correlates with right-wing and left-wing self-placement. We find that young adults who expect to do worse than their parents in the future are indeed more likely to locate themselves at the extreme ends of the ideological scale.

Keywords: Socioeconomic mobility Intergenerational European Political attitudes Left–right self-placement


A Cross-Sectional and Longitudinal Analysis: Current tests find no effect of light and moderate alcohol drinking in cognitive performance (memory, planning, & reasoning)

Alcohol Consumption, Drinking Patterns, and Cognitive Performance in Young Adults: A Cross-Sectional and Longitudinal Analysis. Henk Hendriks et al. Nutrients 2020, 12(1), 200. January 13 2020. https://doi.org/10.3390/nu12010200

Abstract: Long-term alcohol abuse is associated with poorer cognitive performance. However, the associations between light and moderate drinking and cognitive performance are less clear. We assessed this association via cross-sectional and longitudinal analyses in a sample of 702 Dutch students. At baseline, alcohol consumption was assessed using questionnaires and ecological momentary assessment (EMA) across four weeks (‘Wave 1’). Subsequently, cognitive performance, including memory, planning, and reasoning, was assessed at home using six standard cognition tests presented through an online platform. A year later, 436 students completed the four weeks of EMA and online cognitive testing (‘Wave 2’). In both waves, there was no association between alcohol consumption and cognitive performance. Further, alcohol consumption during Wave 1 was not related to cognitive performance at Wave 2. In addition, EMA-data-based drinking patterns, which varied widely between persons but were relatively consistent over time within persons, were also not associated with cognitive performance. Post-hoc analyses of cognitive performance revealed higher within-person variance scores (from Wave 1 to Wave 2) than between-person variance scores (both Wave 1 and Wave 2). In conclusion, no association was observed between alcohol consumption and cognitive performance in a large Dutch student sample. However, the online cognitive tests performed at home may not have been sensitive enough to pick up differences in cognitive performance associated with alcohol consumption.

Keywords: young adult; alcohol consumption; cognitive performance

4. Discussion

We hypothesized that light to moderate drinkers would obtain similar cognitive task scores as
compared to abstainers, whereas heavy drinkers would obtain lower cognitive task scores. While the
first part of our hypothesis was retained, we did not find lower cognitive task scores for heavy drinkers.
In this study, we did not find any consistent association between alcohol consumption and cognitive
performance in a large population-based sample of young Dutch adults. This observation was made
both cross-sectionally as well as longitudinally after a one-year follow-up. These null findings were
observed for both the average amount of alcohol consumed as well as for the various drinking patterns.
However, the results of this study should be interpreted with caution, because the null findings of this
study have to be viewed in light of the high variance of the cognition scores.
The strengths of this study are the use of a large and homogeneous group of young adults: all
students of similar age and similar level of education. This is relevant because cognitive performance
largely depends on age and educational level. The group, however, spanned a large range of alcohol
consumption and included various drinking patterns. Both the cross-sectional and longitudinal
analysis used validated and well-recognized cognition tests. We selected these cognitive tests, since we
considered them to provide a somewhat better indicator for day-to-day functioning and brain health
as compared to functional MRI images showing changing patterns of blood circulation [14,15].
EMA may be a suitable methodology for alcohol consumption evaluation. EMA encompasses
the brief but intensive repeated assessment of people’s thoughts, feelings, and behaviors in their
real-world settings. The ecological validity of EMA data is considered high [19]. EMA reduces
retrospective bias when assessing alcohol consumption, as suggested by higher consumptions as
compared to consumptions recorded by regular questionnaire. EMA also has a low cognitive bias
due to direct retrieval [33]. Furthermore, the repetitive data collection allowed the study of drinking
patterns in addition to commonly reported average consumption levels. This is relevant since
alcohol-drinking pattern may be an important determinant for the harmful effects of drinking, such as
binge drinking [11,12,20].
Population surveys using questionnaires typically report underestimates of alcohol consumption
of approximately 40–50%. Researchers adjust alcohol survey data to weight estimates such that
they match alcohol sales or alcohol tax data. The current study suggests that underestimation of
alcohol consumption in this population exists, but to a lesser extent than assumed in population
surveys. EMA has been recognized as an alternative for assessing alcohol consumption in the natural
environment [34].
Previous studies found inconsistent results on the relation between alcohol consumption and
cognitive performance. The majority of studies indicate that long-term heavy drinking has strong
negative associations with diseases of the brain such as dementia [35]. Many short-term studies
indicate cognitive impairment in heavy binge drinkers as compared to nondrinking controls [8–13].
The outcome of comparing two groups differing in drinking habits, however, may depend on the
selection criteria and may potentially be hampered by confounding. Excessive heavy drinking is
usually accompanied by impulsive behaviors, risk-seeking behavior [36], and other traits [16] that may
confound the association between alcohol consumption and cognitive performance. Some authors
suggest that impaired cognitive performance may partly predict excessive alcohol consumption,
whereas excessive alcohol consumption does not always predict impaired cognitive functioning [37].
Contrary to the differences in cognitive performance between heavy binge-drinkers and
nonbinge-drinking controls, long-term moderate drinking has been associated with a reduced risk
of dementia and a reduced risk of cognitive decline. Reviews of prospective studies showed that
moderately drinking elderly have a decreased risk of dementia and cognitive decline [38,39]. Thus,
after a very long follow-up, moderately drinking persons may be expected to show a less severe decline
in cognitive performance as compared to those that drink excessively and abstainers. This suggests
that there may be a J-shaped association between alcohol consumption and dementia and cognition, as
has been described for cardiovascular diseases [40]. The risk reduction for dementia and age-related
cognitive decline observed in the elderly may occur through a mechanism related to cardiovascular
disease risk factors, whereas the cognitive impairment observed in young binge-drinking adults may
occur through a mechanism related to neurotoxicity.
Our results correspond with those reported previously by Boelema et al. [18]. The null findings
regarding the association between alcohol consumption and cognitive performance in that study were
interpreted as being methodological in nature; the tests used may not have been sensitive enough
to detect a potential cognitive performance reduction as a consequence of alcohol consumption. We
also used conventional standard tests that are routinely used for cognitive performance evaluations.
However, some aspects of our testing differed. Firstly, the tests were performed in an ‘at home situation’
as opposed to ‘at a testing facility’, which may have affected the results in various ways. For some
individuals, performing cognitive tests in an environment that they are familiar with may positively
influence performance. For others, the at home environment may have provided more distraction, or
the lack of experimental control and the fact that no experimenter was present may have reduced focus
and motivation, negatively affecting performance. All these factors may have affected test results and
might explain the high within-person variability. Secondly, cognitive tests employed in the present
study did not allow evaluation of aspects like reaction time, which may have contributed to a less
complete test result.
The cognition tests did seem to detect differences, since small significant differences were observed
for education level. It is important, however, to extend these studies to enable detection of small
differences in cognitive performance that may be induced by light and moderate alcohol drinking.
Significant differences in cognition tests may be detected by decreasing the variability in the cognition
test outcomes.
Although the study was set up with a group of students to obtain a high degree of homogeneity,
this also has its limitations. The results obtained in this group cannot be generalized to the general
population nor to specific other groups like persons with a low socioeconomic status. Specific groups
may respond differently to alcohol consumption and may have more difficulty in adapting their
drinking pattern whenever needed. In general, it has been extensively described that adolescents are
less sensitive to the negative effects of alcohol, including cues that influence self-regulation of intake,
but are more sensitive to positive effects, which may serve to reinforce or promote excessive intake [41].
This response to alcohol may promote the development of alcohol use disorders, a development
university students may be less vulnerable to as compared to other groups of adolescents [7].
Our study design, however, had several limitations that warrant consideration. The null findings
of this study have to be viewed in light of the high variance of the cognition scores. Whereas in
the ‘real-life’ study, the within-person variability was higher than the between-person variability, in
the laboratory study, the within-person variability was lower than the between-person variability.
This suggests that the use of cognition tests in a ‘real-life’ setting may not have been suitable or
sufficiently sensitive to detect a possible reduction in cognitive performance in association with alcohol
consumption. Some of the tasks were, however, sensitive to education level, as university students
outperformed polytechnic students, which would be expected as the former is a higher level of
education. Furthermore, it is expected that the cognition tests used in this study might have been
adequate to detect (possible) small differences in cognitive performance when used in a laboratory
setting, provided a sufficiently large participants population.
In the present study, follow-up time was only one year. It would have been interesting to show
in the same cohort that students who keep on drinking in a hazardous way will show cognitive
impairment after many years. Boelema et al. [18], however, did report on cognitive performance after
a four-year follow-up yet did not find indications for cognitive impairment in adolescent drinkers,
including heavy drinkers.
In conclusion, it is important to build on this study by reducing variance in online cognitive
testing or by testing in a laboratory setting to better assess the association between light and moderate
alcohol drinking and cognitive performance. In the present study, variance in cognitive performance
was too large to detect an association, if any, between alcohol consumption and cognitive performance.
Future studies should carefully consider both the context in which cognition is assessed as well as the
type of tasks that are used.

Body mass index is a highly heritable trait, but heritability estimates of BMI are lower in childhood because of the influence of shared environmental factors, in old-age because of unique environmental factors

Obesity and eating behavior from the perspective of twin and genetic research. Karri Silventoinen, Hanna Konttinen. Neuroscience & Biobehavioral Reviews, Volume 109, February 2020, Pages 150-165. https://doi.org/10.1016/j.neubiorev.2019.12.012

Highlights
• Body mass index (BMI, kg/m2) is a highly heritable and polygenic trait.
• Heritability increases after early childhood and is highest in early adulthood.
• Obesogenic micro- and macro-environments reinforce genetic variation.
• Candidate genes of BMI express in brain tissue, suggesting the importance of behavior.
• Emerging evidence suggests that genes can affect BMI through eating behavior traits.

Abstract: Obesity has dramatically increased during the last decades and is currently one of the most serious global health problems. We present a hypothesis that obesity is a neuro-behavioral disease having a strong genetic background mediated largely by eating behavior and is sensitive to the macro-environment; we study this hypothesis from the perspective of genetic research. Genetic family and genome-wide-association studies have shown well that body mass index (BMI, kg/m2) is a highly heritable and polygenic trait. New genetic variation of BMI emerges after early childhood. Candidate genes of BMI notably express in brain tissue, supporting that this new variation is related to behavior. Obesogenic environments at both childhood family and societal levels reinforce the genetic susceptibility to obesity. Genetic factors have a clear influence on macro-nutrient intake and appetite-related eating behavior traits. Results on the gene-by-diet interactions in obesity are mixed, but emerging evidence suggests that eating behavior traits partly mediate the effect of genes on BMI. However, more rigorous prospective study designs controlling for measurement bias are still needed.

Keywords: TwinsGeneticsObesityBMIEating behavior

7. Conclusions
A century of genetic family studies and a decade of GWA studies have dramatically increased our understanding on the genetic architecture of common obesity, eating behavior and their mutual associations. However, this increasing knowledge has also clearly demonstrated the challenges, especially when trying to understand the mechanisms of how genes affect BMI and other obesity indicators. BMI has been shown to be a highly heritable trait, but the heritability changes over the life course. The heritability estimates of BMI are lower in childhood and in old age as compared to early adulthood and middle-age. In childhood, the lower heritability is because of the influence of environmental factors shared by co-twins and in old-age because of environmental factors unique to each twin. The similar pattern of increasing influence of genetic factors and diminishing effect of the shared environment during late childhood and adolescence has been reported for many psychological traits, such as intelligence (Plomin and Deary, 2015), and probably reflects the changing dynamics of the interplay between genes and the environment. During adolescence, dependence on parents decreases, social networks widen, influence from peers become stronger and sensation-seeking increases (Ahmed et al., 2015; Kilford et al., 2016). This probably leads to the possibility to more freely create one’s own environment, including the environment influencing BMI, which is partly affected by genetically influenced preferences. There is a lot of evidence for this so-called active gene–environment correlation for psychiatric traits (Jaffee and Price, 2007), and genetic factors have been found to influence life events, also demonstrating the dependence of genes and environment (Kendler and Baker, 2007). However, for BMI the direct evidence on gene–environment correlations is still suggestive. Studies on the heritability of macro-nutrient intake and eating patterns suggest that shared environmental factors have effect on eating behavior in childhood and adolescence, and this influence disappears until adulthood. Twin and molecular genetic studies have shown that after early childhood new genetic variance emerges. It is very possible that this genetic variance is related to eating behavior when children can more independently regulate their own eating, but direct evidence is still lacking. There is some evidence that eating behaviors can modify the genetic effects of obesity, but most of these studies are based on cross-sectional data and the results are somewhat mixed. Thus, more studies on how the interplay between genes and the environment modifies the genetic architecture of BMI during the formative years of childhood and adolescence are still needed. The strong effect of genetic factors on BMI does not mean, however, that the family environment does not have effect on BMI. Adoption studies have clearly shown that the adoptive family also has an effect on BMI. A likely explanation for these results is that the family environment affects BMI by reinforcing the effect of genes affecting BMI. There is direct evidence on this based mainly on twin studies since both the micro-level environment (e.g., parental education) and the macro-level environment (measured as the level of obesity between countries and measurement years) affect the genetic variation of BMI. Thus, those children having a genetic susceptibility to obesity gain more weight in family environments or societies predisposing to obesity. These results underline the importance of community food environments, since they can suppress or reinforce the effects of genetic variants associated with obesity. There has been a lot of discussion on which specific community-level factors are behind the obesogenic environments, but there is no clear consensus (Kirk et al., 2009). The associations are also likely to be very complex, as found in a previous study demonstrating that the community food environment can modify how health counseling affects eating behavior (Lorts et al., 2019). There is a lack on studies whether the micro- and macro-environment can modify the genetic variation of macro-nutrient intake in a similar way as they affect the genetic variation of BMI. Thus, more research is needed to specify which community-level factors reinforce the genetic variation of BMI and analyze the role of eating behavior behind these associations. GWA studies have clearly shown that BMI is a highly polygenic trait and thus confirms the basic principle of genetic family studies. The mechanisms of how genes affect BMI are still poorly understood, but the expression of the candidate genes of BMI in the brain tissue suggests that they affect BMI through behavioral factors. There is also evidence based on both twin and GWA studies that genetic factors affect macronutrient intake and appetite-related eating behavior traits. However, to date, there is only limited direct evidence on the overlap of genes affecting BMI and eating behavior which would suggest that the genes affect BMI through eating behavior. Some studies have shown this mediation effect, but they can explain only a fraction of the association between genetic factors and BMI. This area is, however, very challenging because of the well-known difficulties to measure dietary intake and reliance on self-report scales to assess eating behavior traits. Some sex differences in the genetic architecture of obesity indicators were identified. In BMI the proportion of genetic variation was roughly similar in males and females from infancy to old age, but especially after puberty, somewhat different sets of genes started to affect BMI in males and females and this difference increased during adulthood. It is likely that this reflects differences in body composition since somewhat different sets of genes affect muscle and fat body tissues. Accordingly, the SNPs associated with WHR adjusted for BMI showed different effect sizes in males and females. Very little is still known on sex differences in the genetic architecture of eating behavior. Thus, it is too early to argue whether genetic factors affect obesity traits in males and females differently through eating behavior or whether the found differences reflect only endocrinological differences between the sexes. At the beginning of this review we presented the hypothesis: Obesity is a neuro-behavioral disease having a strong genetic background mediated largely by eating behavior and being sensitive to the macroenvironment. There is strong evidence for this hypothesis based on previous genetic research, but the evidence that the genes affect especially through eating behavior is still emerging and mainly indirect at the moment. More rigorous prospective study designs controlling the well-known biases of measuring food intake would be necessary to prove this part of the hypothesis or to show that other behavioral mechanisms are also important when explaining the effect of genes on BMI.