Measuring value tradeoffs: problems and some solutions (2024)

Baron, J. (2000). Measuring value tradeoffs: problems and somesolutions. In E. U. Weber, J. Baron, & G. Loomes (Eds.)Conflict and tradeoffs in decision making: Essays in honor of Jane Beattie. New York: Cambridge University Press.

Jonathan Baron1
University of Pennsylvania

Abstract

Many applications require the measurement of tradeoffs betweenvalues, such as that between money and risk, or between healthquality and length of life. One way to measure such tradeoffs isto ask respondents for attractiveness ratings of stimuliconsisting of pairs of attributes, e.g., a reduction of 1/100,000in the annual risk of death for $100. Holistic ratings of thissort are useful if the tradeoffs are independent of irrelevantfactors such as the range of values on each attribute within thesession, or the magnitude of the highest value. Unfortunately,such independence is not always found. Another approach toeliciting consistent tradeoffs is to ask respondents to resolvethe inconsistencies inferred from their responses to tasks thatrequire comparisons of intervals on each dimension. This approachseems more promising. Two experiments illustrate each approach.

The measurement of value tradeoffs is central to applied decisionanalysis. It was also a major concern of Jane Beattie from thetime of her thesis to her work with Graham Loomes and others,described in the last chapter. In part of her thesis, Janeexamined the use of holistic rating judgments of two-attributestimuli, as a way of measuring the tradeoff between the twoattributes (Beattie & Baron, 1991).

One possible effect of difficulty is to make judgments oftradeoffs more labile, more influenced by extraneous factors.One kind of tradeoff judgment is to make holistic desirabilityratings of stimuli in a set of stimuli that vary in at least twodimensions, e.g., cost of a purchase and travel time in order tobuy it. Tradeoffs between two dimensions can be assessed byasking how much of one dimension must be given up in order tocompensate for a change in the other dimension, with respect tothe effect of these changes on the rating. This measure oftradeoffs should reflect the effect of these changes on the goalsthat motivate the judgments, e.g., the willingness to sacrificetime to save money. It should not be affected by the range ofvalues on either dimension.

We found, in general, that tradeoffs were,in fact, unaffected by the range of variation, provided that therange conveyed no relevant information about utility. If thisresult is generally true, then holistic judgments would be a goodway to measure tradeoffs, for practical purposes. Mellers andCooke (1994), however, found range effects in several similartasks.

The present research finds an inconsistent pattern of sensitivityto ranges themselves. However, it also has found substantialeffects of magnitude of the values, which can be independent ofthe range of variation within a group of trials. For example,the amount of money that must be saved to justify spending anextra hour is greater when the purchase price is higher, as ifthe utility of money were judged as a proportion of the pricerather than an absolute amount. Some range effects may resultfrom magnitude effects, but we still do not understand theconditions that produce range effects when magnitude is heldconstant.

Introduction

The measurement of value tradeoffs is central to applied decisionanalysis. How much money is a life worth? A year of health? Anhour's relief from pain? In choosing a cancer treatment, howshould we trade off the symptoms of the treatment against theprobability of a cure? In buying a car, how should we trade offthe safety of the car for the driver against its effects on airpollution? If we could answer these questions on the average,then we could design policies designed to maximize utility. Forexample, a health-care provider could provide all life-savingtreatments up the the point at which the average cost per year oflife is more than its customers or citizens, on the average, thinkshould be paid. The same can be said for other publicexpenditures, from highway safety to protection of wilderness.

A problem with this approach is that responses are ofteninternally inconsistent (Baron, 1997a). Some of theinconsistency is specific to the methods used. For example, theuse of hypothetical gambles seems to be distorted by the certaintyeffect, and, more generally, by the fact that probabilitiesgreater than zero and less than one seem to be treated as moresimilar than they should be.

Other sources of inconsistency are more ubiquitous. Primarily,people are insensitive to quantity when they compare twoattributes. People find it surprisingly easy to say that healthis more important than money, without paying much attention to theamount of health or money in question. For example, Jones-Lee,Loomes, and Philips (1995) asked respondents to evaluatehypothetical automobile safety devices that would reduce the riskof road injuries. The respondents indicated their willingness topay (WTP) for the devices. WTP judgments were on the average only20% higher for a risk reduction of 12 in 100,000 than for areduction of 4 in 100,000. Such results imply that the rate ofsubstitution between money and the good, the dollars per unit,depends strongly on the amount of the good. If a risk reduction of12 is worth $120 and a risk reduction of 4 is worth $100, thenthe dollars per unit of risk reduction are 10 and 25,respectively. If we extrapolate downward linearly, a riskreduction of 0 would be worth $90. Or we might think that thescale is logarithmic, so 1/3 of the risk reduction would be worth5/6 of the price. So a risk reduction of 4 ·1/3·1/3, or .44, would be worth $100 ·5/6 ·5/6 or $69.44, or about $156 perunit of risk reduction. The dollars per unit could increasewithout limit. We cannot communicate the size of the error witha confidence interval - even on a logarithmic scale - becausethe confidence interval is potentially unbounded. This makes itdifficult to generalize results to different amounts of money orrisk, a generalization that is nearly always required. Even whensuch generalization is not required, such extreme insensitivityover small ranges raises questions about validity of any singleestimate.

The problem is not limited to WTP. It happens when respondentsare asked to assign relative weights directly to non-monetaryattributes. Typically, the attributes are given with explicitranges, such as ''the difference between paying $10 and paying$20,'' and ''the difference between a risk of 1 in 10,000 and arisk of 2 in 10,000.'' The weights are typically undersensitiveto the range of the attributes. If risk is judged to be twice asimportant as cost, this judgment is relatively unaffected whenrisk reduction is doubled (Weber & Borcherding, 1993). Keeney(1992, p.147) calls this kind of undersensitivity to range ``themost common critical mistake.''

A third type of judgment suffers from the same problem, thejudgment of the relative utility of two intervals. In healthcontexts, respondents are often asked to evaluate some condition,such as blindness in one eye, on a scale anchored at normal healthand at death. Implicitly, they are asked to compare twointervals: normal - blind-in-one-eye, and normal - death. Whathappens when we change the standard, the second interval?Normatively, the judgment should change in proportion. Forexample, keeping normal at one end of each dimension, the utilityof blind-in-one-eye relative to death should be the product of twoother proportions: the utility of blind-in-one-eye relative toblindness (in both eyes), and the utility of blindness relative todeath. In fact, people do not adjust sufficiently for changes inthe standard (Ubel et al., 1996), just as they do not adjustsufficiently for changes in the magnitude of other dimensionsinvolved in other judgments of tradeoffs. I shall call thisphenomenon ``ratio inconsistency,'' since it is based on a productof ratios (following Baron et al., 1999). I shall also view thesevarious forms of insensitivity as manifestations of the sameproblem. In principle, all known manifestations of insensitivitycould be understood as tendencies to give the same answerregardless of the question. It is an open question whether thevarious forms of inconsistency can be explained in the same waysor not.

Undersensitivity to range can be reduced. Fischer (1995) foundcomplete undersensitivity to range when respondents were askedsimply to assign weights to ranges (e.g., to the differencebetween a starting salary of $25,000 and $35,000 and between 5and 25 vacation days - or between 10 and 20 vacation days - fora job). When the range of vacation days doubled, the judgedimportance of the full range of days (10 vs. 20) relative to therange of salaries ($10,000) did not increase. Thus, respondentsshowed inconsistent rates of substitution depending on the rangeconsidered. Respondents were more sensitive to the range, withtheir weights coming closer to the required doubling with adoubling of the range, when they used either direct tradeoffs orswing weights. In a direct tradeoff, the respondent changed onevalue of the more important dimension so that the two dimensionswere equal in importance, e.g., by lowering the top salary of thesalary dimension. (Weights must then be inferred by eithermeasuring or assuming a utility function on each dimension.) Inthe swing weight method, respondents judged the ratio between theless important and more important ranges, e.g., ``the differencebetween 5 and 25 vacation days is 1/5 of the difference between$25,000 and $35,000.''

In the direct tradeoff method, the range is given for onedimension only. The task is thus analogous to a free-response CVjudgment, so we might still expect - and Fischer still found -some insensitivity. Baron and Greene (1996) found that thisinsensitivity could be reduced still further by giving no specificranges for either dimension. Respondents were asked toproduce two intervals, one on one dimension and one on the other,that were equally large in utility. For example, instead ofasking ``How much would you be willing to pay in increased taxedper year to prevent a 10% reduction in acquisition of land fornational parks?'', the two-interval condition asked subjects togive an amount of taxes and a percentage reduction that theywould find equivalent. Of course, one end of each interval waszero.

Holistic ratings

Another way to measure tradeoffs is to ask for ratings of stimulithat vary in two or more dimensions. For example, the stimulicould be policies that differ in cost and amount of risk reduced.If the respondent produces enough of these judgments, we could fitsimple models to her responses and infer how much of a change inone dimension is needed to make up for a change in anotherdimension, so that both changes together would yield the samerating. The rating response need not be a linear function ofoverall utility (but we could assume that it was for a firstapproximation). A great variety of methods use this generalapproach. The two most common terms are functional measurement(e.g., Anderson & Zalaski, 1988) and conjoint analysis (Green &Wind, 1973; Green & Srinivasan, 1990; Louviere, 1988).

In such a method, the numbers given to the respondent on eachdimension represent attributes that the respondent values, such asminutes or dollars. These value of these attributes does not, weassume, depend on what other things are available. Thus thetradeoff between a given change on one dimension and a givenchange on the other should be unaffected by the range of eitherdimension within the experimental session. If a change from 50to 100 minutes is worth a change from $20 to $40, then thisshould be true regardless of whether the dollar range is from$20 to $40 or from $2 to $400. The need for invariance inthe substitution of time and money arises from the basic idea ofutility itself, which is that it is about goal achievement(Baron, 1994). The extent to which goals are achieved depends onwhat happens, not what options were considered.

Two exceptions should be noted, however. First, sometimes theoptions considered affect the outcome, through their effects onemotions. Winning $80 may seem better if that is the firstprize than if $160 is the first prize, because of thedisappointment of not winning the first prize. Second, in somecases the meaning of a description in terms of goalachievement will depend on the range. For example the raw scoreon an examination can have a different effect on goal achievementas the range of scores is varied, if the examination is graded ona curve. Even when this is not true, respondents who know littleabout a quantitative variable may think of it this way, becausethey cannot evaluate the significance of the numbers (e.g.,``total harmonic distortion'' when buying an audio system - seeHsee, 1996).

Beattie and Baron (1991), using such a holistic rating task, foundno effects of relative ranges on rates of substitution withseveral pairs of dimensions, but we found range effects with somedimensions, particularly those for which the numericalrepresentation was not clearly connected to fundamentalobjectives, e.g., numerical grades on an exam. (The meaning ofexam grades depends on the variance.) This gave us hope thatholistic ratings could provide consistent and meaningful judgmentsof tradeoffs. Lynch et al.(1991) also found mostly no rangeeffects for hypothetical car purchases, except in one study withnovice consumers. (They used correlations rather than rates ofsubstitution, however, so it is difficult to tell how much theirresults were due to changes in variance.) Mellers and Cooke(1994), however, found range effects in tasks where the relationof the numbers to fundamental objectives was clear, e.g., distanceto campus of apartments.

The experiments I report here make me more pessimistic aboutholistic ratings. Although I cannot fully explain the discrepantresults, I have been able to show that holistic ratings aregenerally subject to another effect that is potentially just asserious, specifically, a magnitude effect. People judge theutility of a change or a differece as a proportion of the overallmagnitude of potential, even when the change alone is more closelyrelated to the goal (Baron, 1997a). The result is that judgmentsare dependent on the maximum magnitude on each attribute scale.The classic example is the jacket-calculator problem of Tverskyand Kahneman (1981; replicated under some conditions by Darke &Freedman, 1993).

Imagine that you are about to purchase a jacket for $125, and acalculator for $15. The calculator salesman informs you that thecalculator you wish to buy is on sale for $10 at the other branchof the store, located 20 minutes' drive away. Would you make thetrip to the other store? (Tversky & Kahneman, 1981, p.457)

Most subjects asked were willing to make the trip to save the$5. Very few subjects were willing to make the trip to save $5on the jacket, though, in an otherwise identical problem. Inboth cases, the ``real'' question is whether you would be willingto drive 20 minutes for $5. People judge the utility of saving$5 as a proportion of the total amount, rather than in terms ofits effects on other goals, i.e., its opportunity cost. Baron(1997b) found a similar effect: subjects were less willing to payfor government medical insurance for diseases when the number ofpeople who could not be cured was higher, holding constant thenumber who could be cured. When many people were not cured, theeffect of curing a few seemed like a ``drop in the bucket'' andwas thus undervalued.

Typically, magnitude and range effects are confounded. Magnitudeis defined as the difference between the maximum and zero, andrange is defined as the difference between the maximum andminimum. Usually experimenters who vary the range manipulate themaximum as well. Indeed, both Beattie and Baron (1991) andMellers and Cooke (1993) had higher magnitudes of the maximum oneach attribute whenever the range was higher. Evidently,magnitude effects do not always occur. The fact that they occur,however, makes the measure untrustworthy. The point is that theywould occur if magnitude were varied enough, so the tradeoff thatsubjects make is specific to the magnitudes of the dimensions theyare given.

Baron (1997b) suggests that the magnitude effect is part of a morebasic confusion between similar (and often correlated)quantitative measures. Just as young children answer questionsabout number as if they were about length (a correlatedattribute), and vice versa, so do adults answer questions aboutdifferences as if they were about ratios, and vice versa.Differences and ratios are correlated. Thus, in discussions ofdrug effects on risk, people talk about relative risk (e.g., ratioof breast cancer cases with the drug to cases without it) ratherthan change in risk (difference between cancer probability withand cancer probability without). It is the latter that is morerelevant to decision making.

Both pure range effects and magnitude effects can result from theuse of a proportionality heuristic. Someone who uses thisheuristic evaluates a change on one attribute as a proportion ofsomething else, even when it should be evaluated on its own.This is a reasonable heuristic to use when we know nothing aboutthe meaning of an attribute. For example, when we evaluate thedifference between 30 points and 40 points on a midterm exam, themeaning of this difference may well depend on whether the rangeof scores was 20-50 or 0-60.

Overview

In the rest of this chapter, I describe two sets of experiments.The first set shows the existence of magnitude effects in holisticratings and describes some of the limits on their occurrence. Theresults are damaging to the idea of using holistic ratings tomeasure tradeoffs.

In the last two experiments, I explore a different approach,picking up where Loomes left off in his chapter. Perhaps we canmeasure value tradeoffs by working with the respondents, facingthem with the inconsistencies in their judgments and asking themto resolve these inconsistencies. Decision analysts claim thatconsistency checks usually do not violate the respondent's bestjudgment, for example: ``... if the consistency checks producediscrepancies with the previous preferences indicated by thedecision maker, these discrepancies must be called to hisattention and parts of the assessment procedure should be repeatedto acquire consistent preferences. ... Of course, if therespondent has strong, crisp, unalterable views on all questionsand if these are inconsistent, then we would be in a mess,wouldn't we? In practice, however, the respondent usually feelsfuzzier about some of his answers than others, and it is thisdegree of fuzziness that usually makes a world of difference. Forit then becomes usually possible to generate a final coherent setof responses that does not violently contradict any strongly heldfeelings'' (Keeney & Raiffa, 1993, p.271). Such checks caneven improve the perceived validity of numerical judgments (e.g.,Keeney & Raiffa, 1993, p.200).

Baron et al.(1999) found evidence supporting these claims instudies of elicitation of health utilities. Consistency checksfor the kind of ratio insensitivity described above led tono serious objections from the subjects. Moreover, differentkinds of utility measures were more likely to agree when eachmeasure was adjusted, by the subject, to make it consistent.

Experiment 1

Experiment 1 built on the jacket-calculator problem. Subjectsdid three tasks:
Rating: Subjects rated purchases that differed in price andtime, for attractiveness.
WTP: Subjects expressed their willingness to pay (WTP) moneyto save time, or time to save money.
Difference judgments: Subjects compared a time interval(e.g., ``the difference between 30 minutes and 1 hour'') and aprice interval (e.g., ``the difference between $90 and $100'').They indicated which mattered more to them, and the relativesizes of the intervals in terms of what mattered.

Magnitude and range varied somewhat independently. Magnitude wasmanipulated by multiplying price by 4. Range was varied in therating task by changing the first two items in each group of 8.

Method

Fifty-three subjects - 38% males, 92% students, ages 17-52(median 19) - completed a questionnaire on the World Wide Web.Subjects were solicited through postings on newsgroups and linksfrom various web pages. They were paid $4, and they had toprovide an address and social-security number in order to be paid.

The questionnaire had two orders. Order did not affect theresults. The questionnaire had four sections, Ratings, WTP,Difference judgments, and Ratings again.

The ratings task began, ``Imagine you are buying a portablecompact-disk player and you have settled on a brand that listsfor $120. It is available at several stores, which differ intravel time from where you live (round trip), sale price, andterms of the warranty. Rate the following options forattractiveness on a scale from 1 to 9, where 1 means that you arevery unlikely to choose this option and 9 means that you are verylikely to choose it. Try to use the entire scale. The first twoitems are the worst and best.'' Items differed in price, traveltime, and warranty. The warranty was not analyzed. It was usedsimply to create variation to allow duplicate presentation ofitems that were otherwise the same. It was counterbalanced withall other variables. A typical list of items to be rated was:

CD players
Price Travel time Warranty Rating ...
$120 1.5 hours none
$80 30 min. 1 year
$110 30 min. none
$90 1.5 hours 1 year
$100 30 min. none
$110 1 hour 1 year
$100 1.5 hours none
$90 1 hour 1 year
$120 1.5 hours 1 year
$80 30 min. none
$110 30 min. 1 year
$90 1.5 hours none
$100 30 min. 1 year
$110 1 hour none
$100 1.5 hours 1 year
$90 1 hour none

Notice that, in this list, the first two items in each group of 8have a price range of $40 and a time range of 1 hour. In thecontrasting condition, the time range was 2 hours (0 to 2 hours)and the price range was $20 ($90 to $110). In thehigh-magnitude conditions, price was simply multiplied by 3, sothat the range was also multiplied by 3. Two goods, a CD playerand a TV set, appeared in two orders,given to different subjects. In one order, the conditions were:
CD, low price, high price range (low time range);
TV, high price, high price range;
CD, low price, high time range (low time range);
TV, high price, high time range.
In the other order, the conditions were reversed.

In between the first two and second two rating were the WTP andDifference conditions (always in that order). A typical item inthe WTP condition read, ``You plan to buy a $110 CD player at astore that is 1 hour away. What is the most time you would bewilling to spend traveling in order to buy it for $100instead?'' or ``What is the most you would be willing to pay forone that is 30 min. away?'' The subject was instructed to answerin terms of total price or time. For the first order, the WTPconditions were ordered as shown in Table 1, and these werereversed for the second order.

Table 1.
Goods use for WTP conditions in Experiment 1. In the rightmostcolumn are the geometric means of the inferred dollars per hour.

Initial: Change to:
Good Price Time Price or time Dollars/hour
CD $90 1.5 hours 30 min. $15.68
CD $110 30 min. $90 $39.56
CD $90 1.5 hours 1 hour $21.63
CD $110 30 min. $100 $30.52
TV $270 1.5 hours 30 min. $29.44
TV $330 30 min. $270 $76.02
TV $270 1.5 hours 1 hour $38.86
TV $330 30 min. $300 $60.84

For the Difference judgment, a typical item was:

Which difference matters more to you?
1. The difference between $90 and $100 for a CD player.
2. The difference between 30 minutes and 1.5 hours travel time.

What percent of the larger difference is the smaller difference,in terms of how much it matters?

(In retrospect, the wording of this item is difficult tounderstand. In the data analysis, however, subjects wereeliminated who showed misunderstanding by responding in thereverse way.)

For the first order, the items are shown in Table 2 (reversed forthe second order). Notice that the range manipulation was in bothprice and time: when the price range is higher, the time range islower. This makes the range manipulation stronger.

Table 2.
Items used in the Difference task in Experiment 1. The tableshows the good, the intervals compared, and the geometric meanimplied dollars per hour of the responses.

CD $90 - $100 vs.30 minutes - 1.5 hours $15.54
CD $90 - $110 vs.30 minutes - 1 hour $24.06
TV $270 - $300 vs.30 minutes - 1.5 hours $25.83
TV $270 - $330 vs.30 minutes - 1 hour $56.47

Results

The design permitted an inference of the tradeoff between dollarsand hours in all condition. For ratings, I calculated theorthogonal contrast for the price and time effects on ratings,and took the ratio. I calculated the geometric mean acrosssubjects and did statistical tests on the logarithms. (It isarbitrary whether to use the ratio or its reciprocal. Using thelog means that this choice affects only the sign, not the distancefrom zero.)

For ratings, the inferred monetary value of time was affected bymagnitude (confounded with range) but unaffected by range alone.The (geometric mean) values were (for subjects who had sufficientdata in both conditions being compared): $20.32 for highamounts of money vs.$89.33 for low amounts (t47 = 13.71,p = .0000); and $43.60 when the money range was small (and thetime range large) vs.$46.13 when the money range was large (andtime small).

For WTP, geometric means of inferred dollars per hour are shown inTable 1. T tests on the means of the relevant conditions (e.g.,all the high-dollars vs.all the low dollars) showed that timewas worth more when the dollar magnitude was higher(t52 = 11.4, p = .0000) and when the subject responded withmoney rather than time (t44 = 8.14, p = .0000). Subjects alsopaid more for time when the range of time was small or when therange of money was large, holding magnitude constant(t51 = 4.34, p = .0001). In sum, the WTP measure showed bothmagnitude and range effects, whereas the rating measure showedonly a magnitude effect.

Difference judgments also showed effects of both range(t47 = 3.85, p = .0004) and magnitude (t47 = 5.69,p = .0000), as shown in Table 2. Magnitude was confounded withrange. So these effects can be seen as a replication of thefinding that matching judgments are insensitive to range (e.g.,Fischer, 1995; see Baron, 1997a, for discussion).

To summarize the results, all three measures - holistic ratings,willingness to pay, and difference judgment - were affected bymagnitude (confounded with range), but only difference judgmentsand WTP were affected by range alone. One explanation of theseresults is that the WTP and difference tasks presented two ends ofthe range to be compared, and this encouraged subjects to considerthese two ends as the relevant reference points. Holistic ratings,by contrast, may have allowed subjects to adopt an implicit zeroas the low end of each range.

Whatever the explanation, the fact remains that magnitude effectsrender these tasks unsatisfactory as measures of tradeoffs.

Experiment 2

Experiment 2 manipulated the range and magnitude locally, withineach group of four hypothetical purchases, by presenting two itemsto establish a range and then another two to test the effect ofthe first two. Range was manipulated by holding constant the topof each dimension and varying the bottom: in one condition, themoney ranged from $120 to $80 and the time from 120 to 0 min.,and, in the other condition, the money ranged from $120 to $100and the time from 120 to 60 min. The magnitude manipulationsimply added $100 to the price, holding range constant.

Method

Eighty subjects - 25% males, 51% students, ages 16 to 51(median 23) - completed a questionnaire on the World Wide Webfor $5. The questionnaire began:

Purchases: time and money

This is about how people make tradeoffs between time and moneywhen they buy consumer goods. Imagine that all the items referto some piece of audio or video equipment like a compact-diskplayer or a TV. You have decided to buy a certain model in theprice range indicated on each screen. The issue is whether youare willing to travel some distance in order to save money on theprice.

Half the time, you will evaluate one purchase at a time on a9-point scale (1=very unlikely to buy, 5=indifferent, 9=verylikely to buy). The rest of the time, you will compare twopurchases, also on a 9-point scale (1=A is much better, 5=equal,9=B is much better). Some purchases will be repeated severaltimes. This is not to annoy you but to make sure that you payattention to their existence. When you see these repeatedpurchases, you don't have to give the same answer you have givenbefore, but you can if you want.

There are 56 screens of questions (2 or 4 questions on a screen),followed by a few questions about you.

Each single-purchase evaluation (evaluation, for short) screenhad four purchases, and each purchase-comparison screen had two.The purchases were described in terms of price and time, e.g.,``$100, 60 minutes.'' Table 3 shows the base values used forboth evaluation and comparison conditions:

Table 3.
Base conditions for Experiment 2. Each row represents the itemspresented on one screen. In the comparison condition, the subjectcompared A and B, and then C and C. In the evaluation condition,the subject evaluated A, B, C, and D. In cases 1-7, the timerange is high (0-120) and the dollar range low (100-120). Incases 8-14, the time range is high (60-120) and the dollar rangehigh (80-120).

Purchase:
A B C D
Case $ min.; $ min. $ min. $ min.
1 100 120 120 0 120 60 100 120
2 100 120 120 0 110 60 100 90
3 100 120 120 0 120 90 110 120
4 100 120 120 0 120 60 100 90
5 100 120 120 0 120 90 100 120
6 100 120 120 0 120 60 110 120
7 100 120 120 0 110 60 100 120
8 80 120 120 60 120 60 100 120
9 80 120 120 60 110 60 100 90
10 80 120 120 60 120 90 110 120
11 80 120 120 60 120 60 100 90
12 80 120 120 60 120 90 100 120
13 80 120 120 60 120 60 110 120
14 80 120 120 60 110 60 100 120

Each comparison screen presented a comparison of A and B, and of Cand D. Each evaluation screens presented A, B, C, and Dseparately. Notice that, within this basic design, the first 7purchases have a high range of times (0-120 min.) for purchases Aand B, and a low range of prices ($100-$120). The second 7purchases are the reverse(60-120 min.vs. $80-$120). Thelast two purchases in each screen are the same for thecorresponding items. Thus, effects of the relative ranges of timesand prices are determined by examining the responses to purchasesC and D. Notice also that the tops of the ranges ($120 and 120min.) are constant within the items in the basic design.

This basic design was replicated four times, to make the 56screens. Replications 1 and 2 were comparisons, 3 and 4 wereevaluations. (Because of a programming error, evaluation datawere lost for 28 subjects, leaving 52.) Replications 2 and 4extended the magnitude of prices, and the range, by adding $100to each price. Comparisons of replications 2 with 1, and 4 with3, then, test for a magnitude effect.

The order of the 56 screens was randomized separately for eachsubject.

Results

As a measure of the tradeoff, I computed the relative preferencefor the option with lower price (and higher time). If peopleevaluate price and time with respect to their ranges, thisrelative preference would be greater when the range of prices issmall and the range of times is high. This result occurred in theevaluations (t51 = 4.67, p = .0000) but not in the comparisons(t = 0.94). For the evaluation items, when price range was small,subjects favored the low-priced item by a mean rating differenceof .30, but when the range was high, they favored the low-timeitem by .34.

A simple explanation of this result is that, in the evaluationcondition, subjects attend to all four items presented on eachscreen. When one of the items contains a very low price, theygive it a high rating, but then they feel obliged to give a lowerrating to the item that does not have such a low price. In sum,for the evaluation items, the first two items set up a range ofresponses. In the comparison items, on the other hand, subjectssimply compare the two items they are given. They do not feelbound by their responses to other items on the same screen.

Subjects showed no significant magnitude effect in eithercondition. Although this result seems optimistic, the presence ofa range effect undercuts the optimism for using this task tomeasure value tradeoffs. The magnitude effect may depend onencouraging the subject to use 0 as one of the reference points.When both ends of the dimension are explicitly stated (e.g., 120minutes and 80 minutes) - rather than leaving it implicit thatone end is 0 - range effects may take over.

Experiment 3

Experiments 1 and 2 show either range effects or magnitude effectsin holistic rating. Despite the promising results of Beattie andBaron (1991), the use of holistic rating tasks does not seem toprovide a reliable means of consistent measures of tradeoffs. Themeasures it provides seem to depend on what subjects use as thetop and bottom reference point of each scale.

Another approach to eliciting consistent tradeoffs is to face therespondent with her inconsistencies and ask her to resolve them.That is difficult to do in holistic rating tasks, because therespondent would have to deal with a great many responses as once.When the respondent makes direct judgments of relative magnitude,however, resolution of inconsistency might be easier.

Experiment 3 is an example of one method that might be used tohelp in the resolution of inconsistency. It involves thecomparison of utility intervals. Examples of possible intervalsinclude ``the difference between 60 and 120 minutes,'' ``thedifference between $90 and $120,'' and ``the difference betweennormal health and complete hair loss.'' The last sort ofdifference is of interest for measurement of health utilities. Forexample, if we wanted to determine whether the benefit ofchemotherapy for cancer is worth the cost, part of the cost mightbe the side effects of the therapy. A standard way to measureutilities in health is to compare everything to the intervalbetween ``normal health'' and ``death.'' Policy makers oftenassume that this interval has the same utility for everyone.

Experiment 3 concerns health intervals of this sort rather thanthose involving time and money. The subject judges the utility ofinterval A as a proportion of B, B as a proportion of C,and A as a proportion of C. The AC proportion should be theproduct of the AB proportion and the BC proportion. Typically,the AC proportion is too high (as I noted earlier), which is akind of insensitivity to the standard of comparison.

In the method used here, the subject is forced to resolve theinconsistency but is not told how to resolve it. The subjectanswers three questions on a computer screen. Then, if they areinconsistent, buttons appear on the screen next to each judgment.Each button says ``Increase'' or ``Decrease'' according to whetherthe judgment is too low or too high, respectively, relative to theother two judgments. Each button raises or lowers its associatedresponse by one unit. The subject can make the responsesconsistent by clicking any or all of the buttons.

This experiment used three different methods for comparingintervals: time tradeoff (TTO), standard gamble (SG), and directrating (in two versions, DT and DP, to be described). In the TTOmethod, the subject made a judgment of how many weeks with onehealth condition was equivalent to 100 weeks with a less serioushealth condition. The ratio of the answer to 100 is taken as ameasure of the utility of the less serious health condition relativeto the more serious one, on the assumption that time and utilitymultiply to give a total utility. In the SG method, the subjectgives a probability of the more serious health condition, and this istaken as a measure of the utility of the less serious condition, onthe assumption that the expected utility is what matters. Thedirect rating method asks simply for a comparison of theintervals.

The intervals to be compared were constructed by manipulatingeither the health condition or its probability or duration. Eachinterval was bounded by normal health at one end. Two healthconditions were used for the other end of each set of intervals, onemore severe and one less severe. For TTO, and DP (where P standsfor probability), the third condition was a 50% chance of the lesssevere health condition. For SG and DT (T for time), the third conditionwas 50 weeks of the less severe condition, instead of 100 weeks.

The idea of manipulating a health condition by changing itsprobability comes from Bruner (1999). Bruner was interested inmeasuring the utilities of the major side effect of prostatecancer treatments, sexual impotence and urinary incontinence. Sheused the time-tradeoff method. She asked subjects, in effect, howmuch of their life expectancy they would sacrifice rather thanhave a treatment that would give them an 80% chance of impotence,or a 40% chance, for example. Over a wide range ofprobabilities, the answer to this question was insensitive toprobability. Subjects' willingness to sacrifice their lifeexpectancy did not depend on whether the probability of impotencewas 40% or 80% (although they were a little less willing when itwent up to 99%). This sort of insensitivity to probability makesthe measure useless as a way of eliciting judgments of the utilityof impotence.

The critical question is the one that compares the discounted lesssevere health condition (50% or 50 days) with the more severe condition.For this to be a good utility measure, the answer should be halfof that to the question that compares the non-discounted lesssevere condition to the more severe condition. Will the adjustmentprocess lead to this result?

Method

Sixty-three subjects completed a questionnaire on the World WideWeb, for $3. The subjects were 60% female, 51% students, andhad a median age of 24 (range: 13 to 45). Three additionalsubjects were not used because they gave the same initial answerto every group of items.

The introduction to the study, called ``Health judgments,'' began:

This study is about different ways of eliciting numericaljudgments of health quality. If we could measure the loss inhealth quality from the side effects of various cancertreatments, for example, we could help patients and policy makersdecide whether the benefits of treatment are worth the costs inloss of quality.

The side effects are always written in CAPITAL LETTERS.Here are the effects:

HAIR LOSS (complete)
NAUSEA (food consumption half normal)
DIARRHEA (three times per day)
FATIGUE (enough to be unable to work)
There are also combinations of effects.

In some question, you make two options equal by saying howmuch time with one side effect is equivalent to a longer timewith some other side effect that isn't so bad. In some cases,the side effects are not certain.

Make sure to try the practice items before going on.

In one kind of question, you give a time. You must answerwith a number from 0 to 20. Feel free to use decimals. Here isan example (using deafness):

A. 100 weeks with deafness.
B. 50 weeks with blindness and deafness.

To answer this, you must pick a number for B so that the twooptions are equal. Try picking different numbers of weeks for B,going up and down, until you feel A and B are equal. Do this nowby clicking on one of these two buttons:

The buttons were labeled ``A is worse now'' and ``B is worsenow.'' Clicking one button adjusted the number of days in the boxby smaller amounts. The next practice item use probabilityinstead of time to equate two options. Subjects were also toldabout the rating items, and they were told the number of items.Finally, they were told:

After you enter your answers, the buttons will suggest changes inyour numbers. They will say ``Increase'' or ``Decrease.'' Pleasechoose the button that is most consistent with your true judgmentof the conditions. Keep clicking one button or another until youare told you can go on. I am interested in how you choose toadjust your responses when you are forced to adjust them. ...

The items were worded as follows, with S1 being the less severeof two symptoms and S2 the more severe. S1 was always one ofthe four symptoms listed. S2 was either two of the symptoms,including S1 (e.g., NAUSEA AND FATIGUE when S1 was NAUSEA) orall four. (Each symptom occurred equally often as a member of thepair.)

Time tradeoff

Fill in each blank so that the two options are equal.

A. 50% chance of S1 for 100 weeks
B. S1 for ___ weeks

A. S1 for 100 weeks
B. S2 for ___ weeks

A. 50% chance of S1 for 100 weeks
B. S2 for ___ weeks

Standard gamble

Fill in each blank so that the two options are equal.

A. S1 for 50 weeks
B. ___% chance of S1 for 100 weeks

A. S1 for 100 weeks
B. ___% chance of S2 for 100 weeks

A. S1 for 50 weeks
B. ___% chance of S2 for 100 weeks

Direct judgment (time)

If the difference between normal health and 100 weeks of S1is 100, how large is the difference between normal healthand 50 weeks of S1?

If the difference between normal health and 100 weeks of S2is 100, how large is the difference between normal healthand 100 weeks of S1?

If the difference between normal health and 100 weeks of S2is 100, how large is the difference between normal healthand 50 weeks of S1? Direct judgment (probability)

If the difference between normal health and S1is 100, how large is the difference between normal healthand a 50% chance of S1? All the symptoms in this example are for 100 weeks.

If the difference between normal health and S2is 100, how large is the difference between normal healthand a 100% chance of S1?

If the difference between normal health and S2is 100, how large is the difference between normal healthand a 50% chance of S1?

To the right of each response box was a button, blank at theoutset. After the responses were filled in, the program firstchecked whether the third was less than each of the others andrequired a change of answers if it was not. Then the programchecked to see whether they were consistent. Consistency wasdefined in terms of the relation of the three responses: after allthe responses were divided by 100, the third response had to bethe product of the other two, to the nearest unit. If theresponses were consistent, the subject could go on to the nextscreen. If the third response was too high, the word ``Increase''appeared on the first two buttons and ``Decrease'' appeared on thethird button. The subject clicked any of the three buttons untiltold to go on. Each button adjusted the response by 1 unit, up forincreases and down for decrease. (The subject could also type inthe response.) If the third response was too low, ``Increase''and ``Decrease'' were switched. The subject had to make theresponses consistent before going to the next screen.

Results

Subjects were initially inconsistent and insensitive toprobability and time, as expected. The requirement for them tobecome consistent made them more sensitive to probability andtime.

The measure of inconsistency for each screen was the log (base 10)of the ratio of the third answer to the product of the first twoanswers (after dividing all answers by 100). This would be 0 ifresponses were consistent. The mean inconsistency over all fourelicitation methods was .0245 (t63 = 3.45, p = .0010), whichimplies that the third answer was about 6% too high, averaged inthis way. The four methods differed in the size of this effect(F3,189 = 5.57, p = .0011): .0422 for time tradeoff, .0335 forgambles, .0098 for direct-judgment (probability), and .0123 fordirect-time.

The main measure of insensitivity to probability and time was theratio of the third answer to the second, minus .5. (The normativestandard was .5.) This measure was positive, as expected ifsubjects adjust too little for the change in probability and timebetween the second and third answers. The mean was .0260(t63 = 2.96, p = .0043). The four methods differed in the sizeof this effect (F3,189 = 6.49, p = .0001): .0145 for timetradeoff, .0475 for gambles, .0284 for direct-judgment(probability), and .0135 for direct-time. (Note that thesedifference cannot be understood as involving effects of time vs.probability.)

The response to the first question did not differ significantlyfrom .5 overall, although the four methods differed(F3,189 = 5.33, p = .0015), with means of .4714 for timetradeoff, .5083 for standard gamble, .5042 for direct-time, and.5106 for direct-probability.

The main result of interest concerned the ratio of the second andthird questions. Should have been .5, but its mean, over allmethods, was, as noted, was too high by .0260. After theadjustment for consistency, the mean was .0050, not significantlydifferent from 0. The change was significant (t63 = 3.65,p = .0005). However, this result could arise artifactually if theadjustment button on the third answer said Decrease more oftenthan it said Increase, assuming that the direction of change didnot affect the magnitude of change. Accordingly, I computed themeasure for the Increase and Decrease trials separately. Theaverage change for the Decrease trials was .0842 (in a downwarddirection), and the average for the Increase trials was .0479 (inan upward direction). For the 52 subjects who had data for bothcases, the mean difference between these was .0382. That is, thedownward change was greater than the upward change so that, on thewhole, subjects became more consistent (t51 = 3.36, p = .0015).Thus, the benefit of the adjustment is not simply the result offorcing subjects to move in the direction required. When theywere forced to move in this direction, they moved more than whenthey were forced to move in the opposite direction. They alsomoved more often in the former direction (66% of the possiblecases vs.56%; t51 = 1.76, p = .0422, one tailed).

Experiment 4

Experiment 4 illustrates another approach to consistencyadjustment. Subjects are given an estimate of what their responseswould be if they were consistent. Unlike Experiment 3, thesubjects do not have to adjust their responses. They are giventhe adjusted responses as a suggestion only. At issue is whetherthey will accept the suggestion and become more consistent.

Experiment 4 used three different health conditions, rather than usingtwo health conditions one of which was discounted. It used only twomethods, time tradeoff and direct rating.

Method

Fifty-eight subjects completed a questionnaire on the World WideWeb, for $5. The subjects were 65% female, 38% students, andhad a median age of 27 (range: 12 to 69).

The introduction to the study, called ``Health judgments,'' began:

This study is about different ways of eliciting numericaljudgments of health quality. If we could measure the loss inhealth quality from various conditions, we could measure thebenefits of treating or preventing these conditions. This wouldallow more efficient allocation of resources.

The conditions we consider are:
NEARSIGHTEDNESS (need glasses)
BLINDNESS IN ONE EYE
TOTAL BLINDNESS
PARTIAL DEAFNESS (hearing aid restores normal hearing)
DEAFNESS IN ONE EAR (complete, hearing aid doesn't help)
TOTAL DEAFNESS
LOSS OF WALKING IN ONE LEG
LOSS OF WALKING IN BOTH LEGS
PARALYSIS OF ALL LIMBS
SPLINT ON INDEX FINGER (dominant hand)
SPLINT ON HAND (dominant side)
SPLINT ON ARM (dominant side)CAST ON FOOT
CAST ON LEG
CAST ON BOTH LEGS
LOSS OF EYEBROWS
LOSS OF HAIR ON FACE AND HEAD
LOSS OF ALL HAIR (including face and head)

Notice that these conditions are in six groups of three. Withineach group, the conditions are ordered in severity. The subjecthad to do a time-tradeoff practice items before beginning, as inthe last experiment. They were also told about the rating items.They were told the number of questions and encouraged to useddecimals in their answers.

The first 36 trials contained 18 time-tradeoff judgments and 18direct judgment items. Each time-tradeoff item was introducedwith ``How many days makes these two outcomes equal?'' The directjudgment items were worded the same as the practice item.

The subject made each type of judgment three times for each of thesix groups of conditions. Within each group of conditions, thesubject compared the first and second, first and third, and secondand third. S1 and S2 thus stand for the conditions being judged,e.g., ``BLINDNESS IN ONE EYE'' and ``TOTAL BLINDNESS.'' By usingall three comparisons, I could test for internal consistency. Inparticular, the judgment of the extremes (first as a proportion ofthird) should be the product of the other two judgments (asproportions). These 36 trials were presented in a random order,each on its own screen, which disappeared when the subjectresponded.

After these 36 trials, the subject saw 24 screens with threejudgments to a screen, again in a random order. These consistedof two types of judgments, each in a trained and untrainedversion, for each of the six groups of conditions.

In both trained and untrained conditions, each screen began:''Please respond to all three items again in the boxes provided.You do not need to give the same response you gave, and you do notneed to make your answers consistent. Try to make your answersreflect your true judgment.''

In the trained direct-judgment condition, the next paragraph read,``The second column shows one way to make the ratios of youranswers agree. The second row percentage is the product of thepercentage in the first and third rows. In the trainedtime-tradeoff condition, the last sentence read, ``The second rowratio of days (to 100) is the product of the ratios in the firstand third rows. This assumes that all days count equally.''

The subject then saw a table with the items on the left and eithertwo or three columns of numbers, for example:

Original
responses
Consistent
responses
Final
responses
PARTIAL DEAFNESS
was 5% as bad as
DEAFNESS IN ONE EAR
14%
PARTIAL DEAFNESS
was 5% as bad as
TOTAL DEAFNESS
2%
DEAFNESS IN ONE EAR
was 5% as bad as
TOTAL DEAFNESS
14%

For the time tradeoff, the upper left entry would have read, ``100days of DEAFNESS IN ONE EAR was as bad as 5 days of PARTIALDEAFNESS,'' and the second column would contain days instead ofpercent. For the untrained condition, the second column wasomitted. Let us refer to the three comparisons as AB, BC, andAC, for the three rows, respectively. AB and BC areadjacent, and AC is extreme.

The consistent values in the second column were computed so as topreserve the ratio of the two adjacent comparisons and otherwisemake the responses consistent. The two adjacent comparisons (ABand BC) were multiplied by a correction factor, and the extremecomparison (AC) was divided by the same factor. (The correctionfactor was not constrained to be more or less than one.) Inparticular, the correction factor was [AC/(AB·BC)]1/3.The correct values were rounded to the nearest unit, but thesubject was encouraged to use decimals.

Results

For each condition group, I computed a measure of inconsistency:log10[AC/(AB·BC)]. (The direction of the ratio isarbitrary. It could be inverted. The log insures that inversionwould affect only the sign of the inconsistency, not itsmagnitude.) I averaged this measure over the six sets ofconditions, for each of the two methods, in the initial, trained,and untrained conditions. I also computed an absolute-valueinconsistency measure for each condition group, and averaged it inthe same way. Table 4 shows the means of these two measures forthe two methods.


Table 4.
Mean inconsistency measures for the two methods.

Method Inconsistency (signed) Absolute inconsistency
Time-tradeoff, initial .28 .45
Time-tradeoff, untrained .11 .29
Time-tradeoff, trained .05 .19
Direct judgment, initial .25 .35
Direct judgment, untrained .08 .21
Direct judgment, trained .04 .14

The trained items were more consistent than the untrained, which,in turn, were more consistent than the initial items, by bothsigned and absolute measures. I tested this with four analyses ofvariance, one for initial vs.untrained and one for untrainedvs.trained, for each of the two inconsistency measures. It issuperfluous to compare initial and trained. But the comparison ofinitial and untrained tests the (confounded) effects of doing theitems together in a group and doing them for the second time. Theinitial vs.untrained effect was significant for signed measures(F1,57 = 68.2, p = .0000) and for unsigned measures(F1,57 = 62.0, p = .0000). The effect of time-tradeoff vs.direct judgment was significant only for the absolute measures(F1,57 = 17.9, p = .0001): time-tradeoff was less consistent.In neither case was the interaction between method and initialvs.untrained significant. The improvement that resulted frompresenting the items together and again was present for bothmethods.

Inconsistency was smaller in trained than untrained for bothsigned and unsigned measures (F1,57 = 8.14, p = .0060, andF1,57 = 51.6, p = .0000, respectively). Again, the effect ofmethod was significant only for the absolute measure(F1,57 = 13.1, p = .0006). The interaction betweentraining and method was not significant. Training improvesconsistency in both methods.

Discussion

The first two experiments add to existing demonstrations thatholistic ratings are sometimes subject to extraneous influences inthe form of range effects or magnitude effects. We can accountfor these effects in general by assuming that subjects adopt tworeference points, top and bottom, for each dimension and evaluatethe position of an item relative to these reference points, atleast some of the time. That is, they think of variation alongthe dimension as a proportion of the distance from top to bottomrather than as an absolute change along a dimension whose unitshave value in their own right. This is sometimes a reasonablemethod of evaluation, e.g., in evaluating examination grades. Butit is used even when the subject can evaluate the units in theirown right.

What is adopted as the top and bottom is somewhat variable anddependent on details of the task. Experiment 1 found magnitudeeffects in holistic ratings, WTP, and difference judgment, in thetradeoff of time and money. It found range effects in WTP anddifference judgment but not in holistic ratings. As noted, therating task may have differed from the others in that subjectsmight have found it easier to adopt zero as the implicit bottom ofthe range.

Experiment 2 used four items at a time, with the first two itemssetting the range. It found range effects when subjects evaluateditems one at a time, but not when they compared one item to theother within a single question. A possible explanation of thisresult is that the comparison format provides its own context, sosubjects ignore the context in previous questions. If so, thedirect comparison may be helpful in overcoming range effects.This conclusion would be similar to that of Fischer (1995). Note,however, that the direct comparison is very much like the directjudgment tasks used in Experiments 3 and 4.

In those experiments, subjects compared two intervals rather thanmaking a judgment of a single two-attribute stimulus. As found inprevious studies (Baron et al., 1999; Ubel et al., 1996), all ofthese measures showed ratio inconsistency: subjects did not givesmall enough numbers when they compared a small interval to a muchlarger one (or, conversely, they did not give large enough numbersin their other responses). When this inconsistency was called totheir attention, responses became more consistent. This is therecommended approach of applied decision analysis, and so far itseems to work, at least in the sense that it yields usable,consistent answers.

Holistic judgments have other problems. When respondents areasked to rate multiattribute stimuli with several attributes, theyseem to attend only to a couple of attributes that they findparticularly important, thus ignoring the less importantattributes too much (Hoffman, 1960, Figs.3-7; von Winterfeldt& Edwards, 1986, p. 365). However, this is likely to be a lessserious problem when respondents rate two attributes at a time.Still, the existence of range and magnitude effects seemsdifficult to avoid. The only way to avoid it seems to be topresent explicit intervals for comparison.

This claim is consistent with the finding of Birnbaum and hiscolleagues (1978; Birnbaum & Sutton, 1992) that subjects asked tojudge the ratio of two stimuli respond (with a nonlinear responsefunction) to the difference between the stimuli rather than to theratio of their distances from zero (no stimulation, in a sensorytask). However, when subjects are asked for ratios ofdifferences - e.g., what is the ratio between the utility (orloudness, etc.) difference between A and B and the differencebetween C and D? - they base their responses on the ratio ofthe differences, and not the difference of the differences. Itwould seem that the two-stimulus ratio task does involve fourstimuli, because a reference point is implied, e.g., zero loudnessor normal health. Birnbaum's result can be taken to imply,however, that we must state the reference point explicitly if wewant subjects to use it, so we do this when we ask aboutdifferences.

Explicitness in stating the ends of ranges being compared is oneof the prescriptions of decision analysis (Fischer, 1995), but itis not used routinely in other value-elicitation tasks. Theresults reported here suggest that such explicitness in thecomparison of intervals is a good starting point for valueelicitation. The rest of the process involves applyingconsistency checks and asking respondents to make adjustments.The checks used here are only examples of many others that couldbe used.

References

Anderson, N.H., & Zalinski, J.(1988). Functionalmeasurement approach to self-estimation in multiattributeevaluation. Journal of Behavioral Decision Making, 1,191-221.

Baron, J.(1994). Thinking and deciding (2nd ed.). NewYork: Cambridge University Press.

Baron, J.(1997a). Biases in the quantitative measurement ofvalues for public decisions. Psychological Bulletin, 122,72-88.

Baron, J.(1997b). Confusion of relative and absolute risk invaluation. Journal of Risk and Uncertainty, 14, 301-309.

Baron, J., & Greene, J.(1996). Determinants of insensitivityto quantity in valuation of public goods: contribution, warmglow, budget constraints, availability, and prominence. Journal of Experimental Psychology: Applied, 2, 107-125.

Baron, J., Wu, Z., Brennan, D.J., Weeks, C., & Ubel, P. A.(1999). Analog scale, ratio judgment and person trade-off asutility measures: biases and their correction. Manuscript.

Beattie, J., & Baron, J.(1991). Investigating the effect ofstimulus range on attribute weight. Journal of ExperimentalPsychology: Human Perception and Performance, 17, 571-585.

Birnbaum, M.H.(1978). Differences and ratios in psychologicalmeasurement. In N.Castellan & F.Restle (Eds.), Cognitive theory, (Vol.3, pp.33-74). Hillsdale, NJ: Erlbaum.

Birnbaum, M.H., & Sutton, S.E.(1992). Scale convergenceand utility measurement. Organizational Behavior and HumanDecision Processes, 52, 183-215.

Bruner, D.W.(1999). Determination of preferences andutilities for the treatment of prostate cancer. Doctoraldissertation, School of Nursing, University of Pennsylvania.

Darke, P.R., & Freedman, J.L.(1993). Deciding whether toseek a bargain: Effects of both amount and percentage off. Journal of Applied Psychology, 78, 960-965.

Fischer, G.W.(1995). Range sensitivity of attribute weightsin multiattribute value models. Organizational Behavior andHuman Decision Processes, 62, 252-266.

Green, P.E., & Srinivas an, V.(1990). Conjoint analysis inmarketing: New developments with implications for research andpractice. Journal of Marketing, 45, 33-41.

Green, P.E., & Wind, Y.(1973). Multiattributedecisions in marketing: A measurement approach. Hinsdale, IL:Dryden Press.

Hoffman, P.J.(1960). The paramorphic representation ofclinical judgment. Psychological Bulletin, 57, 116-131.

Hsee, C.K.(1996). The evaluability hypothesis: Anexplanation of preference reversals between joint and separateevaluation of alternatives. Organizational Behavior andHuman Decision Processes, 46, 247-257.

Jones-Lee, M.W., Loomes, G., & Philips, P.R.(1995).Valuing the prevention of non-fatal road injuries: contingentvaluation vs.standard gambles. Oxford EconomicPapers, 47, 676 ff.

Keeney, R.L.(1992). Value-focused thinking: A path tocreative decisionmaking. Cambridge, MA: Harvard UniversityPress.

Keeney, R.L., & Raiffa, H.(1993). Decisions withmultiple objectives. New York: Cambridge University Press(originally published by Wiley, 1976).

Louviere, J.J.(1988). Analyzing individual decisionmaking: Metric conjoint analysis. Newbury Park, CA: Sage.

Lynch, J.G., Jr., Chakravarti, D., & Mitra, A.(1991).Contrast effects in consumer judgments: Changes in mentalrepresentation of in the anchoring of rating scales. Journal of Consumer Research, 18, 284-297.

Mellers, B.A., & Cooke, A.D.J.(1994). Tradeoffs dependon attribute range. Journal of Experimental Psychology:Human Perception and Performance, 20, 1055-1067.

Tversky, A., & Kahneman, D.(1981). The framing of decisionsand the psychology of choice. Science, 211, 453-458.

Ubel, P.A., Loewenstein, G., Scanlon, D., & Kamlet, M.(1996).Individual utilities are inconsistent with rationing choices: Apartial explanation of why Oregon's cost-effectiveness listfailed. Medical Decision Making, 16, 108-116.

von Winterfeldt, D., & Edwards, W.(1986). Decisionanalysis and behavioral research. Cambridge University Press.

Weber, M., & Borcherding, K.(1993). Behavioral influences onweight judgments in multiattribute decision making. European Journal of Operations Research, 67, 1-12.

Footnotes:

1 This research was supported byN.S.F.grant SBR95-20288, and by a grant from the University ofPennsylvania Cancer Center.

File translated from TEX by TTH, version 2.20.
On 3 Jul 1999, 16:35.
Measuring value tradeoffs: problems and some solutions (2024)
Top Articles
What Was the Zimbabwe Dollar (ZWD)?
T-Mobile’s network will stop working with over a dozen older phones this month
myrtle beach motorcycles/scooters - by dealer - craigslist
High Stakes Homework With My Stepmom
Find Office Depot Close To Me
Pacific Sales Kitchen & Home Ontario
Ex Wife Ben And Christine Domenech
Https E22 Ultipro Com Login Aspx
Find A Red Cross Blood Drive
Bella 700 RAID - Powerboat and RIB
Yumiiangell
Unblocked Baseball Games 66
3D Billiards & 3D MiniGolf Bundle
XFN: Introduction and Examples
385-437-2602
Boost Mobile 69Th Ashland
Craigslist Apartments For Rent Imperial Valley
Weld County Sheriff Daily Arrests
McCarran International Airport Guide
Jaguar XJ gebraucht kaufen bei AutoScout24
Jessica Oldwyn Carroll Update
Teachers Bulge
Waylon Jennings - Songs, Children & Death
Bòlèt Florida Midi 30
Yesterday's Tractors - Port Townsend, WA - 71 Reviews - Auto Repair in Port Townsend, WA - Birdeye
Noaa Weather Seward
Mannat Indian Grocers
Ark Tek Replicator Command
Hilton Honors Punkte sammeln – Die 8 besten Möglichkeiten
Oxford Health Plans Provider Portal
Level 1 Antiterrorism Awareness Training Jko Answers
Chase Bank Hours Drive Thru
Iowa State Map Campus
Martha Sugalski New House
Www Publix Org Oasis Schedule
Am I Racist ? : le succès de ce film réac à la Borat est un "joli" coup de communication
Ukg.adventhealth
How to Sell Cars on Craigslist: A Guide for Car Dealers | ACV Auctions
Best Cheap Rwd Cars
Large Pawn Shops Near Me
Boone County Sheriff 700 Report
Used Boats Craigslist
Fgo Spirit Root
Seven Wonders of the Ancient World
Morally Rigid Crossword Clue
American Iris Society Wiki
Find Deals And Listings on Craigslist Houston Today
Holiday Gift Bearer In Egypt
South Bend Cars Craigslist
Jeld Wen Egress Calculator
Coors Field Seats In The Shade
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 5840

Rating: 4.9 / 5 (69 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.