向量积判断优劣弧_判断经验论文优劣的10条诫命

向量积判断优劣弧

There are a host of pathologies associated with the current peer review system that has been the subject of much discussion. One of the most substantive issues is that results reported in leading journals are commonly papers with the most exciting and newsworthy findings. The problem here being that what might be novel and newsworthy for some may be overreaching with questionable validity for others. The ability to publish ‘sexy’ findings with questionable validity is often facilitated by a variety of problems in the research design, such as small samples and the winners curse, multiple comparisons, and the selective reporting of results.

与当前的同行评审系统相关的许多病理已成为许多讨论的主题。最实质性的问题之一是，领先期刊报道的结果通常是具有最令人兴奋和新闻价值的发现的论文。这里的问题是，对于某些人来说可能新颖而有新闻价值的东西可能超出了人们的理解范围。研究设计中经常会出现各种问题，例如小样本和获胜者的诅咒，多次比较以及对结果的选择性报告等，都有助于发布具有可疑有效性的“性感”发现。

Fortunately, these issues have been the subject of much discussion and self-reflection amongst scientists across all disciplines. While career incentives may lead to researchers being careless with their analysis in order to publish exciting findings, most often issues are the result of misinformation coupled with cognitive biases such as confirmation bias which we are all susceptible too (e.g. we tend to only see the evidence we want to see) rather than any malfeasance. Ultimately, I feel much of this is a problem with statistics education and more generally a focus on the teaching of a technique as opposed to problem-orientated skillset without the appropriate focus on critical thinking skills. Well certainly upon graduation I could do all manner of analytical techniques without really understanding what I was doing!

幸运的是，这些问题已成为所有学科的科学家之间讨论和自我反思的主题。尽管职业动机可能会导致研究人员为了发表令人振奋的发现而粗心地进行分析，但大多数情况下，问题是错误信息加上认知偏见(例如确认偏见)的结果，我们也容易受到影响(例如，我们倾向于只看到证据)我们希望看到)，而不是任何渎职行为。最终，我觉得这在统计学教育中是一个很大的问题，更笼统地说是侧重于技术的教学，而不是面向问题的技能组，而没有适当地关注批判性思维技能 。可以肯定的是，毕业后我可以做各种分析技术，而无需真正了解自己在做什么！

Much of the focus in this area has correctly been on trying to get researchers to change their behaviour by being more reflective and also transparent in the presentation of their methods. Less focus has been placed on the behaviour of reviewers and editors. What should reviewers be on the lookout for? It can be hard to distinguish between novel yet valid results versus those of questionable validity particularly for those without a great deal of experience of working with data. This blog is an attempt to provide some general rules to guide reviewers.

这个领域的许多焦点正确地集中在试图使研究人员通过在方法的介绍中更具反思性和透明性来改变他们的行为。很少关注审稿人和编辑的行为。 审阅者应注意什么？ 很难区分新颖却有效的结果和有疑问的有效性，尤其是对于那些没有大量数据处理经验的人。该博客试图提供一些一般规则来指导审稿人。

In particular, in the spirit of the insightful and engaging 10 commandments of applied econometrics I have prepared the 10 commandments for reviewers. It is not an exhaustive and by no means complete list and aimed predominantly at empirical as opposed to purely theoretical papers. One of the main motivations if not the main one for how researchers structure and write their paper is that the approach they follow is what they deem most likely to be deemed a publishable paper by editors/reviewers. Researchers respond to incentives, and I suggest that if reviewers follow these simple steps we can change the ‘rules of the game’, and therefore, ultimately change submission practices and behaviour.

特别是，本着敏锐而引人入胜的应用计量经济学十诫的精神，我为审阅者准备了十诫。它不是详尽无遗的清单，也不是完整的清单，并且主要针对经验性论文，而不是纯粹的理论论文。研究人员如何构造和撰写论文的主要动机(如果不是主要动机)之一是，他们遵循的方法是他们认为最有可能被编辑/审阅者认为是可发表论文的方法。研究人员对激励措施做出了回应，我建议，如果审稿人遵循这些简单的步骤，我们可以更改“游戏规则”，从而最终改变提交行为和行为 。

1. Be more open to uncertainty: Reviewers at least in my experience have a preference for strong statements regarding causality but everything is not always so black and white. It should be okay (encouraged even) for authors to present their findings as suggestive, acknowledge limitations and suggest what future work is needed. Increasingly what reviewers demand, however, is unfailing certainty. Often researchers aim at demonstrating ‘proof’ of the estimated effect and through a series of robustness checks demonstrate no other possible alternative interpretations of the findings. This in turn can lead authors to overstate their findings for fear of being punished by reviewers if they were more circumspect.

1. 对不确定性更开放 ：至少在我的经验中，审稿人倾向于对因果关系做出强有力的陈述，但并非总是如此。作者应该将其发现表现为具有启发性，承认局限性并提出未来需要开展的工作，这是可以的(甚至鼓励)。但是，审稿人越来越要求的是确定性。研究人员通常旨在证明所估计效果的“证明”，并通过一系列的稳健性检查证明对结果没有其他可能的解释。反过来，这可能会导致作者夸大其发现 ，因为担心如果审慎的话，他们可能会受到审稿人的惩罚。

2. Be more accepting of small/modest effect sizes: Not every study will demonstrate that changes in the key variable of interest will lead to big changes in the outcome under examination. Indeed most will not or at least should not. Research proceeds incrementally and demanding large effect sizes is unrealistic. The main problem here is that effect sizes can often be presented in a number of different ways and such expectations distorts incentives so that authors can find creative ways of presenting their estimated effects as ‘large’. While publication should not depend on demonstrating large effect sizes, neither should effect sizes be so small as to be trivial and unimportant. Statistical significance should not be enough and apart from a problem regarding the reporting of the actual magnitude of effect sizes, an issue that is just as problematic is that it is relatively common to make little effort to report effect sizes at all. This should also be discouraged by reviewers.

2. 接受较小/中等的效应量 ：并非每项研究都表明所关注的关键变量的变化会导致所检查结果的较大变化。确实，大多数人不会或至少不应该。 研究是逐步进行的 ，要求大的效应量是不现实的。这里的主要问题是效果大小通常可以用多种不同的方式表示，而这种期望会扭曲激励，因此作者可以找到创造性的方式将其估计的效果表示为“大”。虽然发布不应该依赖于展示较大的效果尺寸，但效果尺寸也不应该如此之小，无关紧要。统计意义不应该是足够的，除了关于效果大小的实际大小的报告方面的问题外，同样有问题的问题是， 很少花力气报告效果大小是相对普遍的 。审稿人也不应鼓励这样做。

3. Don’t be fooled/impressed by complexity: Econometric complexity should not be mistaken for rigour. Simple analyses are often not only easier to understand and communicate but also less likely to lead to serious errors or lapses. If complicated models are needed then ensure that the researchers have presented all the necessary detail so that the ‘technical detail’ can be readily understood. Some examples of prudent questions to ask depending on context might include: What does the simple bivariate relationship look like? Do the results hold up even without the somewhat strange looking functional specification? What do the results look like from a simple comparison of the treatment with the control group before the addition of control variables?

3. 不要被复杂性所迷惑/打动 ： 不应将计量经济学的复杂性误认为是严格的 。简单分析通常不仅易于理解和沟通，而且不太可能导致严重的错误或失误。如果需要复杂的模型，请确保研究人员已经提出了所有必要的细节，以便可以轻松理解“技术细节”。根据上下文提出的审慎问题的一些示例可能包括：简单的双变量关系是什么样的？即使没有看起来有些奇怪的功能规范，结果是否仍然有效？通过在添加控制变量之前与对照组进行简单比较，结果如何？

4. As a natural extension to the above, apply the laugh test: Apply what Kennedy (2002) refers to as the ‘laugh’ test, or what Hamermesh (2000, p. 374) calls the ‘sniff’ test: ‘ask oneself whether, if the findings were carefully explained to a thoughtful layperson, that listener could avoid laughing’. Sometimes if the results appear to be too good to be true, then often they are.

4. 作为上述内容的自然扩展，请应用笑声测试 ：应用肯尼迪(Kennedy(2002))称为“笑声”测试或哈默梅什 (2000， p。374 )所谓的“嗅探”测试：“问自己”如果将调查结果仔细地告知有思想的外行，该听众是否可以避免笑”。有时，如果结果似乎太好而无法实现，那么往往是这样。

5. Ask the right questions: Some potentially useful questions that I find myself commonly asking include: Is the explanation for why the results are only observed for a particular sub-group or in a particular situation plausible? Related to this point I often find myself asking is the explanation presented being derived to fit the results (observational data is inherently noisy!) or is it reasonable for the authors to have had these priors in advance. Are the substantive results greatly impacted by adopting different procedures that seem more sensible? This might include changes to the functional form or selection of control variables. I want to emphasise that these questions should not be designed to ‘null hack’ findings away (also consider point 10 here) but rather get a general sense of how sensible the analysis is and whether conclusions are warranted.

5. 提出正确的问题 ：我发现自己经常提出的一些潜在有用的问题包括：为什么仅针对特定小组或在特定情况下才观察到结果的解释合理吗？与此相关的是，我经常发现自己提出的解释是为了适合结果(观测数据本质上是嘈杂的！)，或者作者事先拥有这些先验是否合理 。采用似乎更明智的不同程序会极大地影响实质性结果吗？这可能包括更改功能形式或选择控制变量。我想强调的是，这些问题不应被设计为“废除”发现(在这里也要考虑第10点)，而是应该对分析的合理性和结论是否必要有一个普遍的认识。

6. Don’t be fooled by or ask for too much by way of robustness checks: Robustness checks can be important but they should not be needed to persuade you of the veracity of the main findings. It is perhaps also worth noting that from an author perspective, suggesting additional robustness checks can add substantial time with often little by way of additional benefit.

6. 不要被鲁棒性检查欺骗或要求太多 ：鲁棒性检查可能很重要，但是不需要它们说服您主要发现的准确性。从作者的角度来看，也许还应该指出，建议进行额外的健壮性检查可能会增加大量时间，而带来的好处却很少。

7. Don’t discriminate: Judge the research on its merits, not by whether it’s a topic you like or comes from big name researchers or institutes. Publication bias is such that the people doing careful and considered research may not always be ones with big ‘reputations’.

7. 不要歧视 ： 根据研究的优劣来评判研究 ，而不是根据它是您喜欢的话题还是来自知名研究人员或机构而来。出版偏见使得认真研究和认真研究的人们可能并不总是具有“声誉”。

8. Related to the point above and one aimed at editors don’t desk reject based on how newsworthy the reported findings are.

8.与上述观点有关，一个针对编辑不要根据报告的结果有多新闻来拒绝。

9. Be wary of grandiose statements, convoluted language or abstract theoretical frameworks.

9. 警惕宏伟的陈述 ，令人费解的语言或抽象的理论框架。

10. Don’t obsess over p-values: As reported by the American Statistical Association, “A conclusion does not immediately become ‘true’ on one side of the divide and ‘false’ on the other”. A p-value coupled with an estimated effect size conveys some useful information but it should not be the be all and end all. As the ASA statement concludes, “No single index should substitute for scientific reasoning.” Personally speaking, there are many things I look for in judging the scientific merits of a paper (many of which are discussed above), but the actual reported p-value, whether it be .01 or .10 is far down the list.

10. 不要迷恋p值 ：如美国统计协会所报道，“结论不会立即在鸿沟的一侧变成'true'，而在另一侧变成'false'”。一个p值加上一个估计的效应大小可以传达一些有用的信息，但它不应该是全部和全部。正如ASA声明得出的结论：“ 任何单一指标都不能替代科学推理 。” 就我个人而言，在判断论文的科学优点时，我需要做很多事情(上面讨论了很多)，但是实际报告的p值(无论是0.01还是.10)都远远不够。