Prompt #1: "What documentation in terms of source materials and coding decisions is it reasonable for journals to demand? What is it unreasonable to demand?"
I think the core requirement is that the materials provided should provide “relatively clear” breadcrumbs between the source materials and the coding decisions. In my view, this requires having the original “warrant” (or evidence) for the coding, the codebook that identifies how “warrants” are turned into codes, and the final coding. Essentially, a new person should have the evidence in front of them, the rules for converting evidence into codes … so they can do it on their own and see if it matches … the actual coding of the initial researcher.
So, for example, you might have something like this:
a) Warrant (text from treaty): “In the event of a dispute between any two or more Parties concerning the interpretation or application of the Convention, the Parties concerned shall seek a settlement of the dispute through negotiation or any other peaceful means of their own choice.”
b) Source of warrant: The original coder should also say where the original warrant comes from: Text of UNFCCC Agreement at http://unfccc.int/files/essential_backg ... onveng.pdf
c) Codebook rule on whether a treaty contains a dispute settlement clause: “A treaty is considered to contain a dispute settlement clause if it contains a clause that describes what will happen if a dispute arises between or among Parties to the treaty.”
d) Code: “Contains dispute settlement clause” (or simply “yes” or “1”)
Armed with the warrant and the codebook rule, the new researcher should be able to generate the original coder’s coding. That is what I think is reasonable to expect. That requires considerable care on the part of the original researcher but seems what would be the requirement to allow replication and verification of original results.
Prompt #2: "What are appropriate standards for other researchers' access to datasets built on hand-coded material (possibly including source materials and coding decisions)?"
--- I think we should distinguish between 2 types of data: a) replication data, and b) the full dataset from which the replication data is drawn. Think of the distinction between 1) evidence, 2) coding, and 3) analysis. Replication data should consist of the evidence and coding for each variable in the analysis. Armed with that, another author can check that codings correspond to the evidence and coding manual, and that the analysis of the coded data corresponds to those generated by the author. That allows others to "check the authors work" and seems central to the enterprise. This replication data should be made available simultaneous with publication. The author should create a “hand-offable” dataset of all such evidence needed to evaluate if the evidence supports the claims in the published work. If the author has met that standard, then others should not be allowed to demand more than that, even if they know there is more underlying data.
But, in most cases, authors generate considerably more data while doing research that ends up not being used in any particular paper. Thus, each article may use only X% of the variables and observations (fields and records) from the overall dataset they have created. The author(s) should not need to hand off this additional data until they have published on it or determined that they do not want to do so.
Prompt #3: "How long is it appropriate to wait to share a very intensive hand-coded research project, if one is planning multiple publications and possibly a book project? Might a scholar provide a description of the data and coding methods, while withholding the actual data and document collections for 2 years post publication? 3 years? 5 years?
-- I think there is something to be said for a longer period, say 2-3 years but not 5. The reason is that collecting all the SOURCE information in one place is often the hardest part of the task. With my IEA Database, it now has over 1200 agreement texts and that took literally a decade to establish. Now that it is there, its quite easy to code and manipulate but that is because I have spent way too much time creating the database and not enough time coding and writing articles from it. But there is value to the community of urging people to not hold things too closely.
-- In my experience, my data has been available for years but the database is so complex that relatively few people have gotten the really hard nuggets of knowledge that it contains simply because it takes a while to get to know the database. So, it would be the rare case that the database developer would not have a huge advantage over others in using the database appropriately.
-- There might be value to a rule of “adding database developer as last author” --- I am not sure how this works in the natural sciences but I think that is, at least in part, how things work there. That is, I believe many database developers would be FAR more willing to hand off data if they are offered 3rd or 4th or 5th authorship on articles. And, of course, this would involve the database developer contributing to the article by gathering, manipulating, and interpreting data in ways that facilitate the publication of the article. So, that is something I haven’t seen discussed but might warrant consideration and development, perhaps in conversation with natural scientists to see how they address similar situations.
Prompt #4: "Should datasets be freely available online or access contingent on registration and permission? What kind of material should be made available next to the dataset - the full source material, all coding decisions, or less? How should the data be maintained after its publication?"
-- I think having a central repository or at least a “single meta-database” like ICPSR(?) but more agreed to as a standard expectation that ALL journals require that a meta-database tag be created at ICPSR(?) where each journal might keep its own data but anybody could identify where datasets were via a single meta=repository. If there was one stop shopping for data, that would help a lot.