^{a}

^{b}

The present paper provides an overview of diary style research. This includes descriptions of different methods and the types of research questions for which they are appropriate. Data analytic methods are described and some recommendations are provided. Recommendations regarding the preparation of manuscripts describing the results of diary studies are also provided.

This article does not concern a study per se; rather, it is intended as an overview of the methods that can be used to conduct what is often referred to as a “diary” or “diary style” study. Given the limitations of a journal article, not all topics are covered in detail. My goal in writing this article was to provide an introduction with the expectation that interested readers who are unfamiliar with diary methods will use this article as a starting point for their further education.

You may notice that this article does not contain a lot of references and citations. This was due to various factors. Many of the points I make could be supported by numerous citations and selecting one or even a small number of them would be essentially arbitrary and therefore relatively uninformative. On the other hand, some of the points I make reflect my experience conducting diary research since 1973 and no citations are available.

Nevertheless, researchers interested in learning more about conducting diary studies (or intensive repeated measures designs) might find the following useful:

These are all books to which there may be limited access. I have written a series of articles about diary studies and how to analyze the data collected in diary studies. These are articles in journals and so they might be more accessible:

The term diary study has no clear definition. It is based on the word diary, which is usually defined as something such as a book in which one keeps a daily record of events and experiences. For present purposes, we will consider a diary study to be a study in which participants provide data on a regular basis over an extended period of time. Regular basis may refer to once or a few times each day, once or a few times each week, and so forth. Extended periods of time may consist of a few days, a few weeks, or longer. The frequency with which data are collected and the time over which data are collected need to be often and long enough to provide a sample of people’s lives that is sufficient to provide a basis for making inferences about the topic of the study. Other terms that have been used to describe such data collection protocols are “experience sampling” and “intensive repeated measures.”

By their nature, diary studies concern naturally occurring phenomena, “Life as it is lived” as described by

In some instances, it is the naturally occurring variability in the environment that is the focus of a study. For example, although one can study the mechanisms of ostracism and the processes that underlie reactions to it, it is only through studying naturally occurring ostracism that one can understand the importance of ostracism in people’s lives. What if ostracism occurred so rarely that few people experienced it? By the way, this is not the case (

I am not proposing that controlled laboratory studies are worthless. Rather, I believe that laboratory studies have important limitations and that some of the limitations can be overcome (at least partially) by studies of naturally occurring phenomena. The two types of research can and should complement each other.

Although various typologies to describe diary studies exist, I think

None of these methods is better than the others. Each has advantages. The selection of which method depends upon the hypotheses or questions of interest. What is the target of inference? What do you want to say?

The most important consideration when designing any study is the nature of the phenomenon in which you are interested. For diary style studies, in addition to theoretical questions, you need to consider the nature of the phenomenon. Unlike experimental studies in which circumstances are created, diary studies rely on naturally occurring phenomena. If the behavior or state that is the focus of your study does not occur during your study, you will not have anything to study.

Broadly speaking, are you interested in something that is easily recognizable and occurs on a regular basis for most people such as social interaction? Or are you interested in something that is more subtly defined and may not occur that often for many people such as transcendental spiritual experiences? The more frequent and common phenomena are, the better suited they are for studying using a diary of some kind. Infrequent phenomena that are not experienced by most/many people can be studied using a diary, but you will have a lot of “extra” data that are not relevant to your interests. In such cases a single occasion survey may be appropriate, or perhaps a less intensive data collection protocol such as once a month.

Assuming you are interested in something that can be defined relatively unambiguously and that occurs with some regularity leads to questions about your specific interest in this entity. Are you interested in how people experience or perceive something, how such perceptions covary with other state-level measures, and/or relationships between dispositional characteristics and perceptions and the covariation among state-level measures? You should design your study to answer such questions. The more clearly you formulate these questions the easier it will be to design your study. Certainly, you can start with vague questions such as: “Do people feel worse on days when they experience interpersonal stress than on days when they don’t,” but before you conduct your study you will need to define precisely what “feel worse” and “interpersonal stress” are.

Once you have defined the constructs of interest, it is simply a matter of selecting/designing measures of these constructs. The clearer and more precisely you have defined your constructs, the easier it will be to measure them. If you are having a lot of trouble figuring out how to measure something you may want to re-evaluate the definition of what you are measuring. Designing measures may not be automatic, but it should not be arduous.

It will probably be easier for you to decide how to measure dispositional characteristics such as traits than constructs at the within-person level. Measures of trait level constructs abound. Measuring constructs that you think will vary within-persons will probably be more challenging. In my work, I frequently use state-level analogs of trait-level measures. I tend not to use trait level measures as they have been designed for two reasons. Most important, many trait level measures tend to be too long to administer on a daily basis or multiple times each day. Also, the wording of trait level measures is typically not appropriate for diary administration (e.g., people are asked how they typically feel or think, or how they feel or think on average). Finally, most trait level measures have numerous items from which researchers can choose items for use in diary studies.

When creating state level analogs of trait level measures, I examine a trait-level measure and select a few items (typically three or four) that can be reworded for administration at the state-level. When available, I consult factor analyses and use items with higher loadings, assuming they are not highly redundant. Shorter measures may not measure a construct as broadly as a longer measure, but this does not mean that shorter measures are not valid. The validity of diary measures created in this way can be examined using relationships between trait level measures and means of diary measures (e.g.,

When presenting items to participants I emphasize the timeframe to which the item applies. For example, in a study that uses end of day reports, items will start with a stem such as “Today I felt…”. In a beeper study, the stem might be “Now, I feel…”. Such stems help respondents focus their thoughts on the construct that you want to measure.

When designing measures for administration on a within-person basis do not feel bound by the trait-level measure on which you are basing your measures. For example, the Rosenberg Self-Esteem Scale (RSE), one of the most widely used scales in psychology, has 10 items. You probably do not need to use 10 items to measure self-esteem (e.g.,

I know that some of you are thinking: “But the Rosenberg Self-Esteem Scale has 10 items. If you are not asking all 10 items you are not measuring self-esteem!!” I understand such thinking, but such a belief confuses a construct with the measure of a construct. Although scores on the RSE are measures of self-esteem, scores on the RSE are not self-esteem.

Regardless of how you decide what items to include, you need to keep in mind the total number of items you are asking participants to answer. At some point (I wish that I could say exactly when), participants will become overloaded, and they will stop making the distinctions that you want them to make. I have seen beeper studies in which participants were asked to make 30 or more judgments such as emotions just a few hours apart. I know that participants provided answers (the data were presented); I am not certain that they distinguished the items as carefully as the researchers hoped or assumed they did.

Keep in mind that people have fixed or limited cognitive resources, and each answer requires some of these resources. If you ask participants to provide too many responses, they will not have enough resources to respond to each item thoughtfully. They may provide answers, but as the number of questions increases these answers are likely to increasingly reflect the influence of some dominant underlying dimension such as the hedonic dimension. For example, as discussed in

With good reason, researchers are expected to provide an estimate of the power of their designs. Unfortunately, it is not easy to estimate the power for the multilevel modeling analyses (MLM) that are standard for analyzing the data produced in diary style studies (described in the next section). As explained in

Although Nezlek and Mroziński do not provide specific advice regarding sample sizes, based on my experience (and the simulations conducted by Nezlek and Mroziński), I offer the following advice, which assumes a design in which days are nested within persons. Note that these estimates are for the sample sizes of the final sample that will be used for analysis. Researchers should anticipate that some percent (perhaps 10%) of participants will be excluded because they do not comply with the research protocol, and similarly, some percent of day-level observations will be excluded.

If hypotheses concern only relationships between a mean of a daily measure and person level measure (such as a trait), 50 participants and 7 days should be adequate. If hypotheses concern only within-person relationships between a single predictor and an outcome and do not concern cross-level interactions involving slopes, 100 participants and 10 days should be adequate. Finally, if analysts are interested in cross-level interactions (i.e., modeling individual differences in Level-1 slopes) researchers will need to include at least 125 participants who provide at least 14 days of data.

I stress that these recommendations are guidelines that can be useful for planning purposes. They are not formal recommendations. Formal power analyses require having a priori estimates of numerous parameters, which can be based on previous studies. If this is not possible, I recommend calculating achieved power after the fact. I should add that many well-intentioned editors and reviewers blindly request (or demand) a priori estimates of power without a full understanding of what is needed to provide such estimates.

There are also the issues of recruiting participants and maximizing compliance with a research protocol. For researchers who rely on students who participate in research as part of a course requirement, recruiting is not particularly difficult. The researcher primarily needs to be certain that the compensation (e.g., credits) is equitable. Recruiting participants from the general public can be done online or through social networks. Regardless, I recommend that researchers be as transparent as possible regarding the study. For example, my colleagues and I have told potential participants that a study is about daily experience (or social interaction), and that many participants in the past have found the study to be enjoyable and informative. This last point is in fact true. Finally, I emphasize the fact that the study will take only a few minutes (less than 10) each day, a limit to which I have adhered.

Researchers need to design study protocols so that compliance with a protocol does not interfere with participants’ lives too much; otherwise, participation will destroy what a study is examining. The most important consideration is how much time answering the questions will take. Pre-testing (not using research assistants or colleagues) can help determine this. Remember, although participants may be interested in a study, they are probably not as interested as you are, and they are most certainly not interested in the subtleties in which you are interested. Interference can also be considered in terms of the number (and likely/possible circumstances) of assessment occasions each day.

Compliance with a protocol can be understood in two ways. Do participants provide answers or do they answer the questions? For example, although participants who are asked to provide 40 ratings multiple times each day may provide these ratings, I would not be confident that they provided all of these answers thoughtfully. As mentioned previously, as the number of total items increased in my social interaction diary studies, the correlation between two measures increased. I cannot offer some all-purpose recommendation other than to be sensitive to the possibility that lengthy protocols will undermine the validity of all responses.

Multilevel modeling (MLM) is the current standard for best practice to analyze diary style data. Although a full discussion of using MLM is beyond the scope of this article, I present a brief description here. Interested readers can consult

When describing MLM I rely on

Within-person | _{ij} = β_{0j} + r_{ij}. |

Between-person | _{0j} = γ_{00} + u_{0j}. |

In this model, there are _{0}, and the overall mean is γ_{00}. The variance of r_{ij} is the Level-1 (or within-person) variance, and the variance of u_{0j} is the Level-2 (or between-person) variance. Note that the intercept from Level 1 (β_{0j}) becomes an outcome at Level 2. In essence, a set of Level-1 coefficients (in this case only a mean) is estimated for each Level-2 unit (person in this example), and then these coefficients are analyzed at Level 2. It is important to note that in reality, all these parameters are being estimated simultaneously.

Assuming that you are using some type of MLM, the first consideration is the number of levels in the analyses. What is nested within what? Typically, decisions about the number of levels in a design are straightforward and are usually dictated or suggested by data. In a diary study, this is usually occasions (days, interactions, etc.) nested within persons.

It is useful to keep in mind that each level of analysis represents a sample, and if there are not enough units of observation at a level of analysis, it will not be possible to model the variance at the level of analysis. For example, assume students from three schools are measured once a day for two weeks. Conceptually, this could be considered as a three-level model (days within students and students within schools). Nevertheless, three schools do not provide enough information to estimate the random variability associated with sampling schools. Three schools do not constitute a sample of schools. In such a case, school can become a person level variable in a two-level model.

The same considerations apply to within-person sampling. For example, assume participants maintain a daily diary twice, perhaps before and after an intervention of some kind. One might be tempted to nest days within times (pre vs. post) and times within persons. Although appealing in some ways, such a model would not be the best approach. Two time periods do not constitute a sample of time periods. In such a case, time of assessment (pre vs. post in this example) would be represented at the day level as a fixed effect.

Often, researchers conduct studies in which multiple observations are collected each day for each person. The classic example of this is the “beeper study” described previously. Typically, the data from such studies are treated as two-level models in which occasions of measurement are nested within persons. The fact that observations are nested within days is ignored. I do not think this is good practice, and if you collect such data, I encourage you to consider analyzing them with three-level models (observations nested within days and days nested within persons). You may have trouble estimating some of the random effects, but your estimates will have taken into account the possibility that what appear to be occasion-level relationships are in fact day-level relationships.

Broadly speaking, I urge you to ignore advice about using ICCs (intra-class correlations) as a method of determining whether nesting is appropriate or not. An ICC describes the distribution of variances, and some argue that if there is no variance at a specific level of analysis that level of analysis can be ignored. Although apparently sensible, this criterion ignores the possibility that relationships between two variables may vary across units of analysis when means do not.

For example, in the data set contained in

Group 1 |
Group 2 |
Group 3 |
|||
---|---|---|---|---|---|

x | y | x | y | x | y |

1 | 5 | 1 | 5 | 1 | 5 |

2 | 4 | 2 | 4 | 2 | 4 |

3 | 3 | 3 | 3 | 3 | 3 |

4 | 2 | 4 | 2 | 4 | 2 |

5 | 1 | 5 | 1 | 5 | 1 |

Group 4 |
Group 5 |
Group 6 |
|||

x | y | x | y | x | y |

1 | 1 | 1 | 1 | 1 | 1 |

2 | 2 | 2 | 2 | 2 | 2 |

3 | 3 | 3 | 3 | 3 | 3 |

4 | 4 | 4 | 4 | 4 | 4 |

5 | 5 | 5 | 5 | 5 | 5 |

Within the multilevel context, when predictors are selected, decisions about how they will be centered need to be made. Centering refers to the reference value from which the deviations of a predictor are taken. In OLS regression, this is usually the mean for a variable. The situation is somewhat more complicated in MLM analyses. I discuss this issue in terms of two-level models, e.g., days nested within persons. More details about centering can be found in the references previously provided and in

At Level 2, there are two options: grand-mean centering and uncentered (also referred to as zero-centered). When a Level-2 predictor is entered grand-mean centered, deviations are taken from the grand mean, and the intercept represents the expected value for an observation that has a value that is at the grand mean of the predictor. When a Level-2 predictor is entered uncentered, deviations are taken from 0, and the intercept represents the expected value for an observation that has a value of 0 on the predictor.

At Level 1, there are three options: grand-mean centering, uncentered, and group-mean centering. Similar to Level 2, when a Level-1 predictor is entered grand-mean centered, deviations are taken from the grand mean, and the intercept represents the expected value for an observation that has a value that is at the grand mean of the predictor. Also similar to Level 2, when a Level-1 predictor is entered uncentered, deviations are taken from 0, and the intercept represents the expected value for an observation that has a value of 0 on the predictor. The third option that is available for Level-1 predictors is group-mean centering. When a Level-1 predictor is entered group-mean centered, deviations are taken from the group mean, and the intercept represents the expected value for an observation that has a value that is at the group mean of the predictor.

Note that “group” in this instance refers to the Level-2 unit of analysis. In a diary study in which days are nested within persons, groups would be persons. When a predictor is group-mean centered the intercept represents the mean for the outcome measure. Group-mean centering is the multilevel equivalent of conducting a regression analysis for each group (person), and then analyzing the coefficients generated by these analyses as outcomes in person-level analyses.

For Level-2 predictors, I recommend grand-mean centering continuous measures and zero-centering categorical measures. This makes interpreting the coefficients and generating predicted values (see below) relatively straightforward. For Level-1 predictors, I recommend group-mean centering continuous measures and zero-centering categorical measures. Note that grand-mean centering Level-1 predictors introduces Level-2 variance into the Level-1 model because the reference point for a grand-mean centered predictor is the grand mean, which represents the mean of all observations.

Modeling error is another aspect of MLM that can be puzzling for analysts whose primary experience is with OLS regression. In contrast to OLS regression in which there is only one error term, in MLM, each Level-1 predictor can have its own error term. Moreover, the covariances among these individual error terms are also estimated. Collectively, the individual error terms and their covariances are referred to as the “error structure.” When thinking about modeling error it is important to keep in mind that how error is modeled can affect estimates of fixed effects, which are typically the focus of hypotheses. So, even though error structures are rarely the focus of hypotheses, you need to ensure that error is modeled properly to ensure that tests of your fixed effects are accurate.

In MLM, Level-1 coefficients can be modeled in one of three ways. (1) Randomly varying – a fixed effect and a random effect (error term) are estimated. (2) Non-varying – a fixed effect is estimated but no random effect is estimated. (3) Non-randomly varying – a fixed effect is estimated, no random effect is estimated, but there is a Level-2 predictor of the Level-1 coefficient. It is important to note that the absence of a random effect (#2) does not mean that an effect does not vary. It means that the random effect cannot be estimated reliably, or in other words, random and true variance cannot be distinguished.

In most software packages, individual error terms for each coefficient are tested for significance. I recommend dropping error terms that are not significant – I see no reason to estimate parameters that cannot be estimated reliably. Noting this, I recommend using a more generous significance level than .05 (at least .10, perhaps .15) when making decisions about including random error terms. This second recommendation reflects the fact that conceptually, most coefficients are random (they represent a coefficient from a sample of coefficients), and this randomness should be modeled if at all possible. If you are concerned about the influence of fixing an effect (i.e., not estimating a random error), run the model with and without the random error term to see how the results differ.

Generally speaking, I recommend not spending too much time describing error structures when writing up the results of analyses. As I said previously, hypotheses rarely concern error structures per se. There are some differences in the strength of inference between randomly varying coefficients and coefficients that do not vary randomly (e.g., the calculation of confidence intervals is more problematic if there is no random error term), but for most intents and purposes, this difference is not important. Finally, how error terms are specified varies across software packages, and analysts need to be careful that their models are estimating the error structures they want to estimate.

Regardless of the specific structure, I recommend starting your analyses with “totally unconditional” models of each of the measures. Totally unconditional refers to the fact that there are no predictors at any level of level analysis. Such a model was presented at the beginning of the section “Logic of MLM.” Such models provide the basic descriptive statistics of a MLM: the mean and the variance estimates at each level of analysis. Although such models typically do not test hypotheses per se, they do provide valuable information about the distribution of variances, information that can be used to guide and evaluate further analyses.

Assuming a two-level model, I recommend constructing the Level-1 model and then examining differences in the Level-1 coefficients at Level 2. For example, if days are nested within persons, the Level-1 model would describe day-level (or within-person) relationships, and the Level-2 model would describe relationships between person-level measures such as personality traits and the phenomena represented by the Level-1 coefficients. When constructing models, it is important to recognize that hypotheses regarding coefficients are tested against a null of 0.

For example, assume you want to model daily self-esteem as a function of daily stress. The model would look like this:

Within-person | _{ij} = β_{0j} + β_{1j} * Stress + r_{ij}. |

Between-person | _{0j} = γ_{00} + u_{0j}. |

_{1j} = γ_{10} + u_{1j}. |

Note that the relationship (slope) between self-esteem (y) and stress (β_{1j}) now becomes an outcome at Level 2. The null hypothesis is that the mean slope (γ_{10}) between self-esteem and stress is 0. This model can be extended to include a predictor of the slope, creating what is sometimes called a “slopes as outcomes” or cross-level interaction.

In the following model, the slope between daily self-esteem and stress is modeled as a function of extraversion. The model is presented below, and whether the esteem-stress slope varies as a function of extraversion is tested by the significance of the γ_{11} coefficient.

Within-person | _{ij} = β_{0j} + β_{1j} * Stress + r_{ij}. |

Between-person | _{0j} = γ_{00} + γ_{01} * Extraversion + u_{0j}. |

_{1j} = γ_{10} + γ_{11} * Extraversion + u_{1j}. |

Constructing a Level-1 model consists of deciding which predictors you want to include, and this includes

Regardless, particularly at Level 1, be conservative in terms of adding predictors. I recommend using what are called “forward-stepping” procedures. When using forward-stepping procedures, predictors are added to models one at a time and checked for significance. Of course, groups of predictors can also be added in sequence. This is the opposite of what are called “backward-stepping” stepping procedures that start with the most complex models and delete terms. Backward-stepping procedures are commonly used in single level regression analyses.

The reason for this recommendation is based on the number of parameters that are estimated in a MLM analysis. MLM estimates more parameters than are estimated in a comparable OLS regression, and the number of parameters increases non-linearly as a function of the number of predictors. For example, the simple model, Y_{ij} = β_{0j} + r_{ij}, estimates three parameters, the mean, and the two variance estimates. Adding a predictor, Y_{ij} = β_{0j} + β_{1j} (x) + r_{ij}, estimates six parameters, fixed and random effects for intercept and the predictor, the correlation between the two random terms, and the Level-1 variance. Adding a second predictor, Y_{ij} = β_{0j} + β_{1j} (x_{1}) + β_{2j} (x_{2}) + r_{ij}, estimates 10 parameters, fixed and random effects for the intercept and the two predictors, the correlations between the three random terms, and the Level-1 variance. As you can see, adding a predictor requires adding more than one parameter, and as the total number of predictors increases, the number of parameters each predictor requires also increases. In contrast, in OLS regression, adding a predictor requires the estimation of only one more parameter.

It is important to consider the number of parameters a model estimates because you need to be careful about trying to estimate more parameters than your data can estimate. The more data you have, the more parameters you can estimate. Nonetheless, to my knowledge, there are no quick and easy guidelines for this. As discussed, I recommend starting simply and keeping models “lean and tight.” Opt for fewer parameters that are estimated well rather than many parameters that are estimated poorly or not so well.

Once the Level-1 model is finalized (including the error structure), then predictors can be added at Level 2. Although adding predictors at Level 2 does not involve the same type of increases in the number of parameters estimated by adding predictors at Level 1, I still encourage analysts to use forward-stepping algorithms. I believe that the possible risk of inflating Type-I error posed by forward stepping is outweighed by the improvement in the quality of the parameters that are estimated.

Particularly for beginning or less experienced analysts I strongly recommend using the program HLM (

I know that some of you, perhaps many, are thinking: “But what about R? If I don’t use R people will think I am naïve.” Let us put aside social pressure and discuss reality. R is a very powerful platform, and for experienced users it can provide numerous, sophisticated options that are not available using other platforms. Unfortunately, it also provides numerous opportunities for analysts to misspecify models, and R modules do not have many (if any) safeguards built in. I will not belabor this point. MLM can be conducted using numerous software packages (SPSS, SAS, Mplus, Stata, R, and others), and if the

Preparing manuscripts describing the results of diary-based studies requires a bit more attention to some details than preparing a manuscript describing an experimental study or a single level regression analysis. These differences are not dramatic, but they can be important for readers (and reviewers). A thorough description of what was done and what was found that is not cluttered with extraneous detail will increase readers’ appreciation of the contribution of a piece of research. Much of what I describe here is standard good practice, but I thought some of it was worth repeating. For a discussion of some of these issues within the context of social and personality psychology, see

A brief description of participants, including how they were recruited and if they were compensated. The specific description will vary as a function of the study at hand, but at the least some basic demographic variables should be provided, age, sex, employment, education, and so forth. There is no reason to spend too much time on this. A few brief sentences should be enough.

A description of what data were deleted from the analyses and why. Diary studies present different demands than single occasion studies, and dropout or exclusion (bad data) rates are typically higher in diary studies than in single occasion studies. I have deleted as many as 10% of participants from the primary analyses of a diary study. Nonetheless, deleting participants is not a crime or a sin. In contrast, deleting participants and not disclosing the fact that they were deleted or not providing a clear rationale for their deletion is not ethical. Similarly, individual observations (e.g., days in a daily diary study) can also be deleted for various reasons, typically because an entry has not been made at an appropriate time (e.g., the middle of the afternoon for a study that asks for end of day reports). Researchers need to establish criteria for inclusion/exclusion at both the within- and between-person levels

A clear description of the data that were collected. This includes the specific questions that were asked and the response scales that were used. You need to provide enough detail so that someone can repeat your study. Sometimes this can be done with a figure, and sometimes these details can be included in online supplemental materials. Regardless of how, they need to be available.

Start the results with a description of the basic analytic framework. For example, for a daily diary study, days could be treated as nested within persons. For a social interaction study, social interactions might be nested within persons. The first model should be a totally unconditional model, i.e., no predictors at a level of analysis. Such basic models can then be used to generate the basic descriptive statistics of a MLM: the mean, and the variance estimates at each level of analysis. Note that in MLM there is more than one variance estimate. I urge you to be cautious when reporting correlations at the within-person level. First, keep in mind that correlations between means that are aggregated across occasions of measurement do not describe within-person relationships. Second, correlations based on samples in which all the Level-1 observations in a study (e.g., diary entries) are treated as a single sample are not accurate because they confound between- and within-person variances. It is possible to estimate within-person correlations using some specialized software packages (e.g., Mplus), but describing how to do this is beyond the scope of this article.

Describe how predictors were entered and how error was modeled. Following the presentation of the descriptive statistics, the sequence of the models that are presented will vary as a function of the focus of the study. Regardless, it is

Avoid number clutter. When describing the results of analyses, it usually suffices to report the coefficients (what the hypotheses invariably concern) and the

Explain results with predicted values. Often, the results of MLM analyses can be difficult to understand based upon coefficients alone. For example, the meaning of a cross-level interaction (slopes as outcomes analysis) may not be readily apparent from the coefficients themselves. Moreover, MLM estimates unstandardized coefficients (despite well-intended but sometimes misguided suggestions about how to standardize them), and so the meaning of a coefficient needs to be evaluated within the context of the variances of the measures that are in a model. I have found it useful to estimate predicted values +/- 1

The use of diary style methods has increased markedly over the past few decades. What was once innovative is now commonplace. Nevertheless, rapid growth in any domain is often accompanied by “growing pains.” For example, my sense is that many researchers do not have a sufficient understanding of the multilevel analyses that are commonly used to analyze diary-style data. Moreover, as analytic options become more complex (e.g., multilevel mediation, multilevel

In this regard, it is important to distinguish the ability to conduct an analysis from knowing if an analysis is correct and what the results mean. Consulting established experts, such as Raudenbush and Bryk, Kreft and deLeeuw, Hox, and Goldstein (to name a few), can help researchers understand the rationale for MLM. Without understanding why, researchers run the risk of running models that are technically accurate but are not accurate in terms of answering the questions a study was designed to answer.

I also see challenges (and opportunities) in terms of deciding what types of methods are appropriate for what types of questions/topics. “One size does not fit all.” Some topics such as mood variation might be better studied using a frequent measurement strategy (e.g., multiple times a day), whereas other topics such as self-evaluation might be better studied on a daily basis. There is no right and wrong for such decisions. Researchers need to define carefully the constructs in which they are interested and develop measures of these constructs.

I also encourage researchers to consider event-contingent methods, i.e., examining characteristics of and reactions to specific types of events/occurrences. Contemporary research has relied heavily on various types of interval-contingent methods (e.g., end of day, random measures during a day), but for events/occurrences that are easily recalled and frequent enough to provide a basis for inference, event-contingent protocols may be appropriate. Moreover, interval- and event-contingent protocols can be combined in the same study. Regardless of the specific protocol, researchers need to be careful to avoid asking for so many responses that the validity of individual responses is compromised. Participants may provide responses, but this does not necessarily mean they have provided answers.

Diary methods are tools that can be used to examine a wide variety of questions. The only limit is researchers’ imagination. I hope that this article has provided some information that can help researchers transform their ideas into reality.

Preparation of this paper was supported by grant 2018/31/B/HS6/02822 awarded to John Nezlek from the Polish National Science Centre (Narodowe Centrum Nauki).

The author declares no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The author has no support to report.