Our dataset is a time-series panel on the app-month level. We conduct the analysis on the app-level because this allows accounting for app-level factors that influence product strategies, and it is in line with prior research on awards (Kov´acs and Sharkey 2014). We considered the Google Play Award in the years 2016, 2017, and 2018. We identified all apps nominated for the award in these years from the official Android Developers Blog run by Google. We removed apps from the sample that were nominated in consecutive years (see Online Appendix C). We obtained monthly data (e.g., on app ratings, updates, and new app releases) from AppBrain, App Annie, and the Internet Archive.Our dataset is a time-series panel on the app-month level. We conduct the analysis on the app-level because this allows accounting for app-level factors that influence product strategies, and it is in line with prior research on awards (Kov´acs and Sharkey 2014). We considered the Google Play Award in the years 2016, 2017, and 2018. We identified all apps nominated for the award in these years from the official Android Developers Blog run by Google. We removed apps from the sample that were nominated in consecutive years (see Online Appendix C). We obtained monthly data (e.g., on app ratings, updates, and new app releases) from AppBrain, App Annie, and the Internet Archive.
We use data on acquisitions from CrunchBase. This data contains a profile for each company, which includes a text description of the business area of the company, information about the number of employees, date of founding, sources of venture financing and a list of acquisitions made by each company. This dataset does not contain SIC based industry classifications but does contain tags that describe the general business area of each company (e.g. Cloud Storage).
In the field studies (Experiments 1 and 6), we targeted a number of approximately 50–100 responses per cell on the basis of funds and participants’ availability. In the MTurk studies (Experiments 2–5), the targeted number of responses per cell was 100–200.
To test this hypothesis, we conducted a field experiment in a momand-pop bakery. We picked this bakery because it has a strong social media presence and is described by bloggers as being an “Insta-worthy” spot because of its aesthetically pleasing interiors and unique desserts.
Participants (n ¼ 398, 170 female, mean age ¼ 37.19 years, SD ¼ 13.01) recruited via MTurk completed an online survey in exchange for monetary compensation.
Five hundred twenty-four members of the MTurk online panel (51% men; Mage ¼ 36.79 years; 99% native English speakers) participated for a nominal fee.
We obtain company financial data from Compustat and patent data from Kogan et al. (2017).18 Data on material subsidiaries disclosed in Exhibit 21 of Form 10-K are from Dyreng and Lindsey (2009) and Dyreng et al. (2013).19 Data on state statutory tax rates and state R&D tax credits are from the Federation of Tax Administrators and Wilson (2009).
We begin with the sample of U.S. firms between 1997 and 2005 in Compustat. Our sample period starts from 1997, which is two years before the first state adopted an addback statute in 1999. Kogan et al. (2017) provide patent data matched with CRSP firms up to 2010. Patents filed before and in 2008 would have most likely been granted by 2010.20 Therefore, the sample for our primary tests ends in 2005, because we examine the number of patents filed three years ahead.
I study the BCFF survey of yield and macroeconomic forecasts over the period from January 1985 through December 2018, with the start date determined by data availability. Of the 194 forecasters, 115 are categorized as financial institutions, 48 are consulting firms, and 31 represent other types of institutions. The forecasters submit forecasts of investment yields on U.S. Treasury bonds that have maturities of six months and one, two, three, five, seven, and 10 years. From these raw data, survey forecasts of zero-coupon bond yields of matching maturities are constructed as in Le and Singleton (2012). Survey data are released monthly at the beginning of the following month (usually the first business day), based on information collected over a two-day period (typically scheduled between the 20th and the 26th of the month). Disagreement is measured as the difference between the 90th and 10th percentiles of the cross-sectional distribution of BCFF zero-coupon yield forecasts.15
Our hedge fund sample is from the Lipper TASS database. TASS classifies hedge funds into 11 strategy categories: convertible arbitrage, dedicated short bias, emerging markets, event driven, equity market neutral, fixed income arbitrage, funds of funds, global macro, long/short equity, managed futures, and multi strategy. Since our sentiment measure corresponds largely to U.S. stock markets, we focus on U.S. equity-oriented hedge funds and drop emerging markets, fixed income arbitrage, and managed futures. Dedicated short-bias funds are also excluded since only 42 such funds satisfy our data filters.8 The sample is free of survivorship bias, as TASS covers both live and defunct hedge funds since 1994 and we examine the period from 1994 onward.
We incorporate data from three sources. First, we collect data on employee perceptions of their employing firms and senior management from Glassdoor.com. Glassdoor.com is a website that allows employees to anonymously provide their perceptions of the firm, senior management and various other aspects of working for a firm. […] To measure news coverage of tax avoidance, we hand-collect data on news coverage of S&P 500 firms’ tax avoidance activities. We searched for news about “tax evasion,” “tax avoidance,” “tax haven” and each company’s name in LexisNexis. We focus on corporate income tax avoidance (see Appendix C for a list of instructions), though we may unintentionally collect other forms of corporate tax avoidance due to human error in hand-collection (e.g., payroll tax avoidance). Our media sources include all worldwide news media sources (e.g., “newspapers,” “news,” “newsletters”) in LexisNexis. Finally, we rely on Compustat Quarterly to incorporate financial statement-based controls.
We pretested to identify brands relevant to our participant population with a high degree of variability in terms of attitudes. One hundred members of the Amazon Mechanical Turk (MTurk) online panel (60% men; Mage ¼ 34.66 years) were paid a nominal fee to list ten brands they thought were “really cool” and ten brands they “would never use,” following the procedures outlined by Escalas and Bettman (2003).
We conducted a pretest to identify hypothetical brand names that varied on name gender while being equally and generally devoid of meaning and unfamiliar to participants. We generated test name stimuli drawing on prior work (Klink 2009; Lowrey and Shrum 2007). In addition to 55 potential name stimuli, a set of 10 words selected on a priori grounds to carry some semantic associations were included to provide a benchmark for comparison and encourage participants to use the full range of response scales (Schmitt, Pan, and Tavassoli 1994; see Web Appendix E).
We conducted a pretest to identify names that vary on gender score while being equal in length, equally and generally devoid of meaning, and unfamiliar to participants
In the dataset, the pre-award period begins in January of each year and ends in March. Nominations are announced in April, which is why we exclude this month from the analysis. Across the years studied, the dates for nomination and award conferral fall into the same month of the year. The post-award period begins in May and ends in March of the following year so that it does not overlap with next year’s award period. The length of the post-award period involves trading off variance in the dependent variables— which gets larger as we extend the post-award period— with capturing the immediate effects. Because we removed app developers who were nominated in subsequent years, little bias is to be expected, but we additionally restrict the post-award period to end in March to avoid any potential effect (see Online Appendix Figure A2).
We collect all daily ratings for the firms in the 2012 S&P 500. We are unable to identify firm information for five of the firms in the S&P 500 on Glassdoor and thus retain 495 of the 500 firms in the S&P500 in our sample. Our final sample spans all calendar-quarters from January of 2008 (Q1) to December of 2017 (Q4).7
The final sample size differs depending on the control group. In the base setup (i.e., runners-up as the control group), the sample comprises 125 developers (30 winners and 95 runners-up) and their 793 apps, resulting in an unbalanced panel of 5,131 app-months. In the matched sample setup based on coarsened exact matching, given the employed procedures outlined below, the sample comprises 8,414 appmonths. In the matched sample setup based on propensity score matching, the sample comprises 8,126 app-months.
The CrunchBase dataset contains information on a large number of companies across a variety of industries. We omitted companies from non-digital industries, as well as companies founded prior to the year 2000. We omitted companies that did not have text descriptions (or very short text descriptions) as we relied on these descriptions to calculate some of our variables. These were typically very small companies, which did not survive very long and typically did not make any acquisitions. Therefore, omitting these observations did not influence our conclusions.
Our final sample consisted of 1,933 companies that made strategic acquisitions and were founded after the year 2000, out of a broader sample of 123,044 digital companies which were in our dataset but did not make any acquisitions. Of the acquiring companies, 278 were platform companies. Our final sample consisted of 3,062 acquisition events.
One hundred fifty students (60% women; Mage ¼ 20.45 years; 77% native English speakers) from a public North American university participated for course credit. We collected the largest sample possible given subject pool constraints.
Our final samples vary in size based on data availability of our dependent variables. We drop all missing observations with missing control variables. Our baseline regressions are performed on 14,840 firm-quarter observations when SeniorMgmt is the variable. When Firm is our dependent variable, we have 14,977 firm-quarter observations.
We allow apps to drop out of the sample (e.g., due to being removed or due to missing data) and to enter the dataset (i.e., due to being released).
Unless reported, no participant was excluded.
Only participants who passed an initial attention check were eligible to participate.
Forty-eight participants failed the attention check, which left us with 369 participants.
We excluded six participants who did not fit the recruitment criteria, as they were visitors who were not affiliated with the university. In addition, three participants did not provide valid Instagram handles and were thus excluded. We were left with 193 participants, 140 of whom agreed to take the survey.
Attention checks and/or IP address checking software were used to screen participants before they entered online experiments to ensure data quality (Dennis, Goodson, and Pearson 2020; Winter et al. 2019), except in Study 2, in which open-ended questions were embedded to discourage automated responses, and Study 3b, which was conducted in person in a lab. Data were analyzed only after collection was complete.
We have several sample selection requirements. First, we remove firms that are not taxed as corporations as well as firms with missing Central Index Keys (CIKs).21 Second, we exclude single-state firms that have material subsidiaries in only one state, because addback statutes are supposed to affect the tax avoidance behavior of multistate firms. Third, we delete firmyear observations that have both negative state income tax and negative domestic pretax income, as these firms pay no state income taxes and thus are unlikely to be affected by state tax policies.22 Next, we exclude firms with missing industry code and firms in nonpatent industries.23 Further, to ensure enough within-firm variation for our analyses, we require each firm to have at least three observations in our sample period.24 Lastly, we restrict the sample to observations with nonmissing data to compute the variables used in the main tests. Our final sample includes 11,228 firm-year observations, which belong to 1946 unique firms.
Following prior research, we apply several screens to the fund data. To address the concern that hedge funds may backfill returns when newly added to the database, we exclude the first 12 months of returns for each fund. We only include funds that report monthly net-of-fee returns in U.S. dollars and allow for redemption at a monthly or higher frequency.9 We also delete duplicate funds and funds with assets under management below $5 million.10 Finally, we require each fund to have at least 30 return observations. After these screens, our sample contains 4,073 hedge funds over the period 1994 to 2018.
The four dependent variables are UPDATE, NEWAPP, MULTIHOMED, and CAT_NUMAPPS. The term UPDATE is an indicator that takes a value of one if app I was updated in month t. To identify updates, we obtained the version number of app i in month t. If the version number changed between two consecutive periods, we coded UPDATE as one (e.g., change from 1.1 to 1.2). We created a second variable, UPD_MINOR, coded as one if a so-called minor update was performed.
The term NEWAPP is an indicator coded as one if developer j of app i released a new app in month t.
The variable MULTIHOMED is an indicator coded as one if app i was available on Apple iOS in month t. For each app, the variable is coded as zero until an app is multihomed, and then is coded as one for the remaining periods. There is variation in the variable across apps, within developers, and over time.
Regarding the independent variables, our empirical framework required two main indicators. AWARD, coded as one for apps of award winners, and AFTER, coded as one if month t is after the ceremony.
Choosing control variables is not trivial because of the risk of including “bad controls.”2 We nevertheless deem it necessary to control for two variables. First, we control for pricing. Product strategies are likely to be influenced by app pricing. The term PRICE holds app i’s purchase price in USD in month t. We (log + 1)-transformed the variable to account for skewed distribution. We use alternative price variables in the robustness section to account for potential concerns over the empirical distribution of PRICE. Second, we control for app quality.
We rely on two further variables in additional analyses. The termNUMRATINGS is the total number of ratings submitted for app i as of month t, and EMPLOYEES is a proxy of firm size in terms of the number of employees of the developer. We infer the number of employees from the number of LinkedIn members that state they worked for each firm. We obtain these data fromeach firm’s LinkedIn page. The termINAPP is coded as one if app i offers in-app purchases inmonth t. The termAPP AGE is the age of app i in t inmonths.
In addition to using runners-up as a control group, we constructed two alternative control groups based on matching. Matching is based on the idea that units—in our case developers—are selected and placed into an artificial control group based on their observational similarity. We selected developers based on their similarity before the award.
As a first step, we take the text descriptions of each company and convert the description into a TF-IDF vector. We remove keywords that occur frequently. These vectors reflect how much particular keywords occur in each text description. We then compare each pair of acquirer and target firms to identify to what extent they overlap in terms of these keywords using cosine similarity. We also test the robustness of this approach by using topic modeling to reduce the dimensionality of these vectors (following Shi et al., 2016), or comparing the overlap based on industry tags (e.g. Cloud Storage, etc.).
We control for industries or business areas (Industry FE) using dummy variables to indicate whether a company is assigned a particular tag in its profile. These tags are used to describe the business of the company, as described above (e.g. Cloud Storage).
We control for Firm Size based on the number of employees that the company employees. This is reported as range (e.g. 50 – 100 employees) in our dataset, and therefore we use dummy variables to indicate the different groups.
As an additional control, we include Funding Controls, which contains dummy variables for the number of venture funing rounds that the firm has received. This provides a measure of the financial resources that a company has at its disposal to undertake acquisitions.
The comparison data set consisted of an equal number of companies randomly drawn from the Thompson Reuters Eikon database.
We calculated the linguistic name gender of the Interbrand and Thomson Reuters Eikon brands using a method developed by Barry and Harper (1995; Appendix). The name gender score quantifies the degree to which a name is masculine or feminine based on its length, sounds, and stress as discussed in the introduction, with scores ranging from 2 (very masculine) to þ2 (very feminine).
We identify a firm as an affected firm if it has at least one subsidiary in a state during the year in which the state adopts the addback statutes.
Further, to identify firm-year observations impacted by the adoption of addback statutes, we construct an indicator variable, Addback. Specifically, for an affected firm, we set Addback to 1 for the adoption year and all the subsequent years, unless the firm no longer has any subsidiary in states with the addback statutes. If a firm has no subsidiaries in states with the addback statutes in a given year, Addback equals 0.
Following prior literature on innovation (e.g., Griliches et al., 1987), we use patent-based innovation measures for two reasons. First, patent is an output measure that captures both observable and unobservable inputs into innovation (He and Tian, 2013), whereas R&D expense only reflects observable inputs. Second, reported R&D expenditures contain significant measurement errors. Koh and Reeb (2015) show that almost one half of firms in Compustat report missing R&D expenditures, and about 10 percent of firms with missing R&D expenditures actually file patents.We also find that R&D expense is missing for 58.4 percent of the Compustat population during our sample period. Therefore, we use patent count and citation count to capture the amount and quality of innovation. Our innovation variables are constructed using patent data provided by Kogan et al. (2017).
Following He and Tian (2013), our first innovation variable is Ln_NPat3, which is measured as the natural logarithm of one plus the number of patents filed three years after the year in which the key independent variable Addback is measured.
Our second innovation variable is Ln_NCite3, which is measured as the natural logarithm of one plus the number of non-self-citations received on patents that are filed three years ahead.
To measure risk-adjusted returns (i.e., alpha), we control for exposures to standard risk factors identified in the hedge fund literature. We start with Fung and Hsieh’s (2004) seven factors: an equity market factor, a smallminus-big size factor, the change in the constant-maturity yield of the 10-year Treasury, the change in the yield spread between Moody’s Baa bond and the 10-year Treasury bond, and three trend-following factors for bonds, currencies, and commodities.13 These factors are commonly used to evaluate hedge fund performance (e.g., Kosowski, Naik, and Teo (2007), Fung et al. (2008), Jagannathan, Malakhov, and Novikov (2010), Sadka (2010), and Cao et al. (2013)). We also control for the inflation rate and default spread, as Bali, Brown, and Caglayan (2011) find that exposures to these two factors are significantly related to hedge fund returns. We further include the momentum factor, as Griffin and Xu (2009) find that hedge funds engage in momentum strategies. Finally, we control for illiquidity risk using Pastor and Stambaugh’s (2003) liquidity factor.
We perform a battery of sensitivity tests. First, instead of tracking returns from the month immediately following portfolio formation, we skip one month. Second, to address concerns about the precision of sentiment beta estimates, we use different combinations of risk factors as control variables in regression 1).
To control for known determinants of hedge fund performance, we perform Fama-MacBeth (1973) cross-sectional regressions of fund excess returns or alpha on sentiment beta, along with various fund characteristics and style dummies. Specifically, we run the following cross-sectional regression of fund excess returns on sentiment beta: where ri,t+1 is the fund excess return in month t + 1, and βˆS i,t is fund i’s sentiment beta estimated from regression model (1) using fund returns in the 36-month rolling window from month t – 35 to month t. That is, the key independent variable—sentiment beta—is estimated from a backward-looking window prior to the return evaluation period for the dependent variable of the regression. The control variables x are predetermined fund characteristics including fund size, fund age, management fee, incentive fee, high-water mark dummy, lockup period, redemption notice period, and fund style dummies.
We focus our collection efforts on employee ratings of their firm and of senior management. We then use the median rating across all employee ratings in a given fiscal quarter, resulting in two variables (SeniorMgmt and Firm). Both variables range from 1 (the lowest rating) to 5. Decreases in ratings imply reductions in perceptions.
We control for return-on-assets because employee satisfaction may be increasing in firm profitability. Moreover, the media may be more likely to cover profitable firms (e.g. Google or Apple). Thus, controlling for profitability limits bias related to media coverage. We control for market-to-book because employees at non-growth firms may rate their firms lower relative to employees at growth firms. We control for leverage because employees at highly levered firms may rate their firms lower because they are concerned about bankruptcy risks. We control for size because large firms receive more media coverage than small firms. The media may “target” large firms for scrutiny more than other firms because larger firms are more well-known (e.g., Chen et al. 2019). Thus, we limit media coverage bias by controlling for size. Moreover, employees at large firms may rate their employers higher than employees at small firms because their salaries are high and/or their jobs are secure. We control for buy-and-hold returns to control for any public information or sentiment that may influence employee ratings.13 Return on assets, leverage and size are seasonally lagged to ensure that we do not control for our hypothesized effect.
At this point, the research assistant pulled out a basket filled with sameflavored macaroons and randomly administered one of two treatments. In one condition, the research assistant invited customers to pick a macaroon, explaining that the bakery had committed to give a free macaroon as a gift in exchange for filling out the survey. Thus, in this condition, the contractual nature of the perk was salient (i.e., high contractuality). In the other condition, the research assistant invited customers to pick a macaroon, explaining that it was a gift from the bakery. Thus, in this condition, the perk was not portrayed as being given out of contractual obligation (i.e., low contractuality). Note that customers in both conditions received an identical gift that was equally unexpected, thus any effect observed is unlikely to stem from differences in the perceived value of the perk or in how surprising the perk was.
Then, they were randomly assigned to two experimental conditions in which we manipulated the perceived contractuality of a perk.
In the high contractuality condition, participants were told that their order was delivered and were given a note informing them that the restaurant included a $15 bonus gift card that they could redeem in the next three days between 10 A.M. and noon. In the low contractuality condition, participants received the same information, except the bonus gift card had no redemption limitations (see Web Appendix B for all stimuli).
We tested the robustness of our analysis to various approaches for constructing this variable to ensure that this particular definition was not driving our results.
For robustness, we consider two alternative measures of sentiment fluctuations: the monthly change in the University of Michigan consumer sentiment index, which is based on surveys of household confidence in the economy, and the FEARS index of Da, Engelberg, and Gao (2015), which captures sentiment changes based on Internet search volume for keywords that reveal investor concerns about the economy.
Our results are robust to including lagged (by one quarter) and contemporaneous forms of these variables.
We expected this manipulation to affect perceived contractuality because the redemption requirements for the high-contractuality gift card were more specific and restrictive than those for the low-contractuality gift card.
Each participant was presented with a randomly selected subset of 10 of the 50 brands and, for each brand, answered warmth and brand attitude measures in random order
The experiment was a one-factor (brand name gender: feminine, masculine) between-subjects design.
Participants were told they would be evaluating a one-minute video from a channel of their choice. All participants were offered a choice between watching a channel with one of our pretested names or watching a video of equal length from a randomly selected YouTube channel. In the feminine name condition, participants chose between the “Nimilia YouTube Channel” and a randomly selected YouTube channel; in the masculine name condition, participants chose between the “Nimeld YouTube Channel” and a randomly selected YouTube channel. They next completed the same four-item warmth measure used in Study 2, along with a single-item measure of pleasantness (“To what extent does the Nimilia [Nimeld] Channel sound pleasant?”), in random order. The dependent variable was channel choice.
The experiment employed a one-factor (participation incentive: cash, feminine-named product, masculine-named product) within-subject design
Students were welcomed to the study and told that, as a thank-you for participating, they could choose either $.50 or one bottle of small-batch hand sanitizer. The hand sanitizers were commercially available customizable products with either a linguistically masculine (Nimeld) or feminine (Nimilia) name on the bottle and the label. (Note: this study was conducted before onset of the COVID-19 pandemic, and the small colorful bottles presented as both fun and functional).
Finally, we asked whether they had discussed the lab session or incentive choice with anyone else before participating.
Participants were randomly assigned to conditions in a 2 (brand name gender: feminine or masculine) 2 (typical user: male or female) between-subjects design.
We manipulated typical user gender by telling participants that the sneakers were either for men or for women, which allowed us to hold the product constant.
We included a suspicion check to rule out possible demand effects and manipulation checks asking who the most typical user of the product was (men: 1, women: þ1, both: 0) and how masculine or feminine the brand name seemed (1 ¼ “very masculine,” and 5 ¼ “very feminine”). The suspicion check indicated that less than 1.5% of participants suspected that the purpose of the study was to examine brand name and user gender.
We conducted 2 (brand name gender: feminine or masculine) 2 (typical user: male or female) analyses of variance (ANOVAs) on the brand name gender and typical user gender manipulation check measures.
Participants were randomly assigned to conditions in a 2 (brand name gender: feminine or masculine) 2 (product category: utilitarian or hedonic) between-subjects design.
We measured warmth by asking participants to indicate the extent to which each brand sounded tolerant, warm, good-natured, and sincere (Fiske et al. 1999; a ¼ .96). Brand attitude was assessed with five items adapted from Chaudhuri and Holbrook (2001) and Brakus, Schmitt, and Zarantonello (2009), asking the extent to which participants were or would be loyal to, committed to, buy, choose, and recommend the brand (a ¼ .95). The brand attitude measure incorporated elements of loyalty because Kervyn, Fiske, and Malone (2012) find that warmth is positively related to brand loyalty. (We also measured competence but do not examine it here; for full details, see Web Appendix C). All items were measured on seven-point scales (1 ¼ “not at all,” and 7 ¼ “very much”).