THIS OPINION WAS INITIALLY ISSUED UNDER PROTECTIVE ORDER AND IS BEING RELEASED TO THE PUBLIC IN REDACTED FORM ON NOVEMBER 25, 1992 GRANTED: November 16, 1992 GSBCA 12011-P, 12012-P CENTEL FEDERAL SYSTEMS, INC., and FEDERAL COMPUTER CORPORATION, Protesters/Intervenors, v. DEPARTMENT OF THE NAVY, Respondent, and INTERGRAPH CORPORATION, Intervenor. William A. Roberts, III, J. Eric Andr , Kathleen C. Little, Lucy Gies, Patricia G. Butler, M. Lee Doane, J. Lloyd Horwich, and Peder A. Garske of Howrey & Simon, Washington, DC, and Stephannie A. Wood, corporate counsel, Centel Federal Systems, Inc., Reston, VA, counsel for Protester/Intervenor Centel Federal Systems, Inc. Gerard F. Doyle and Scott A. Ford of Doyle & Bachman, Washington, DC, counsel for Protester/Intervenor Federal Computer Corporation. Ellen D. Washington, David P. Andross, and Thomas L. Frankfurt, Information Technology Acquisition Center, Department of the Navy, Washington, DC, counsel for Respondent. Rand L. Allen, Philip J. Davis, Christopher D. Cerf, Samuel D. Walker, Paul F. Khoury, and James J. Gildea of Wiley, Rein & Fielding, Washington, DC, counsel for Intervenor Intergraph Corporation. Before Board Judges LaBELLA, Acting Chief Judge, DANIELS, and PARKER. DANIELS, Board Judge. Centel Federal Systems, Inc. (Centel), and Federal Computer Corporation (FCC) both protest that the Department of the Navy (Navy), in evaluating proposals for the supply of computer-aided design and computer-aided engineering (CAD/CAE) hardware, software, and services, committed material errors which constituted violations of statute and regulation. The evaluation led to the award of a contract to Intergraph Corporation (Intergraph). Centel and FCC have intervened in support of each other's protest, and Intergraph has intervened in both on the side of the Navy. As filed on September 11, 1992, the protests contain a multitude of allegations as to the Navy's supposed violations of law. Before the hearing in these cases, the protesters narrowed the issues by discarding some of their counts. The protesters now maintain that the Navy erred in three ways in conducting the procurement: First, the weights given various scores attributed to offerors' live test demonstrations of their proposed systems were inconsistent with the weights prescribed by the solicitation. Second, discussions were inadequate, in light of requirements established in the solicitation. Third, the analysis performed by the Navy to determine which proposal offered the best value to the Government was defective, both because it was based on improper factors and because those factors were applied in an arbitrary fashion. We find for the protesters on all three counts and grant the protests. Findings of Fact 1. On August 9, 1990, the Navy issued the solicitation in question, requesting proposals to provide CAD/CAE products and services over a twelve-year period. The products and services are to be used in support of facilities engineering applications within the Navy and other Department of Defense component agencies. Protest File, Exhibit 4A at executive summary, 1. This procurement is one of several being conducted by the Navy as part of the "CAD 2" program. Id., Exhibit 4B at C-11 ( C0.2(b)). Offerors were informed that the systems being acquired "will be used by Navy facilities engineering and management personnel to increase facilities management productivity, reduce costs through life cycle facilities management, and reduce errors through improved information management." Id. at C-11 ( C0.1), C-21 ( C.1.1(a)). The Navy said that it would award under this procurement a requirements contract with a guaranteed minimum purchase of $8.5 million. Id. at B-2 ( B1), B-46 ( B3), I-10 to I-12 (FAR 52.216-18, 52.216-21). 2. The products to be provided are listed in Section C of the solicitation, "Description/Specifications/Work Statement." This section is 418 pages long and includes thousands of specifications. Protest File, Exhibit 4A at C-1 to C-418. The products fall into three categories: minimum mandatory requirements (MMRs) (products which "are critical to achieving the desired procurement objectives"), evaluated optional products (items which "may be provided at the Contractors' option"), and other technical features (OTFs) (products which "are important to achieving the desired procurement objectives" and "demonstrate a higher level of technical excellence compared to products that do not support these features"). The solicitation provides that "[f]or offerors to receive the maximum technical score, all OTF's must be satisfied." Id. at C-21 ( C1.2-C1.4). 3. The solicitation prescribed the following structure for the procurement. Offerors were first to submit technical proposals. Protest File, Exhibit 4B at M-2 ( M2.1). If a firm certified that its proposal met all MMRs and the Navy considered the proposal complete, the company was invited to make a live test demonstration (LTD) of its proposed system. Id. at M-2 ( M2.2). The LTD was designed to "evaluate specific MMRs and OTFs of particular products," so as to reflect the capacity of the proposed system to perform "real work." Id. at M-2 ( M2.3), Transcript at 36-37. Each offeror that successfully completed the LTD was then permitted to revise its technical proposal (though not to change products that had assisted in the passage of the LTD), which would be formally evaluated. Protest File, Exhibit 4B at M-3 ( M2.5). Successful completion of the LTD also entitled an offeror to submit a price proposal, which was also to be evaluated. Id. at M-3 ( M2.4). All evaluations would be performed by a Source Selection Evaluation Board (SSEB), which would prepare a report for a Source Selection Advisory Council (SSAC), which in turn would make recommendations to the Source Selection Authority (SSA), who would select an awardee. Id. at M-2 ( M2.6, M2.7). Scoring of technical aspects of proposed systems 4. The solicitation stated that a technical score for each offeror was to be "computed from the results of the LTD and the Technical Proposal Evaluations." Protest File, Exhibit 4B at M-4 ( M3.1). In each of these evaluations, MMRs were to be validated and OTFs scored. Id. The LTD score was to represent eighty percent, and the technical proposal score twenty percent, of the total technical score. Id. 5. The LTD consisted of two tests, a Production Test and a Technical Features Test. Protest File, Exhibit 4B at AT2-2 ( 1.3). The former "consists of completing a 3-D digital model and preliminary design drawings and maps for a typical Navy facility;" it was devised so that the agency could "evaluate how well the system achieves the procurement objectives" which are listed at Finding 1. Id. at AT2-12 ( 6.1). The latter test was "a series of independent tests that represent typical problems found in common architectural, engineering and cartographic practices," designed to "permit a detailed evaluation of products that shall be provided through the contract." Id. at AT2-9 ( 5). 6. The LTD score "is based upon the accuracy and completeness of the graphic and numeric results produced and the Offeror's demonstrated productivity enhancement features" displayed at both tests. Protest File, Exhibit 4B at M-4 to M-4a ( M3.1(a)). The solicitation stated, "The LTD score . . . is the sum of the scores of the technical features test and the production test. The production test score is slightly more significant than the technical features test score." Id. at M-4a ( M3.1(a)). 7. The production test score was to be "based upon the productivity enhancement, the drawing content, the drawing content coordination and the computation coordination subfactors." Protest File, Exhibit 4B at M-4a ( M3.1.1). Each of these subfactors "consist[s] of a number of graphic and numeric results," which are "of equal weight." Id. ( M3.1.1(b)). The relationship among the subfactors is described in this way: The productivity enhancement subfactor is of the same inportance [sic] as the combination of the other subfactors, and is applied to the other subfactors to obtain the overall production test score. The drawing content, drawing content coordination and the computation coordination subfactors are of equal importance. Id. ( M3.1.1(a)). 8. The technical features test score was to be "based upon the correctness of the Offeror's results [on specified test problems], the productivity enhancement features demonstratedd [sic] in the test, and the relative importance of the test problem[s] as shown." Protest File, Exhibit 4B at M-4b ( M3.1.2). The solicitation stated, "The correct results produced and the demonstrated productivity enhancements are of equal importance for each test problem" for which productivity enhancement was scored. Id. ( M3.1.2(a)). 9. For each of the two tests, a binary score (zero or one, for fail or pass) was assigned for each result other than productivity enhancement features. Protest File, Exhibit 4B at M-4a ( M3.1.1(b)), AT2-10 ( 5.6). The productivity enhancement features score for a test [problem or] period is a combination of the grade for the Man-Machine Interaction, Graphics/Text Manipulation and Data Exchange & Integrity criteria. The 3 criteria are of equal importance, and are graded on a multipoint basis, where 3 is Excellent, 2 is Satisfactory, 1 is Marginal and 0 is Unsatisfactory. Id. at M-4a ( M3.1.1(d)), M-4b ( M3.1.2(b)). 10. A guide prepared for evaluators by the chief of the Navy's technical evaluation team summarizes what the agency had in mind when it labeled these criteria. According to the guide, "The suitability of the man-machine interaction requires evaluating the impact of the system hardware and software features on the operator's performance." Characteristics of "man-machine interaction" include display, pointing device, graphics tablet, keyboard, and window utilization; movement of information to and from peripheral devices and network servers; menus, command shells, and macro commands; and help functions. "The simplicity of the graphic/text manipulation procedures requires focusing on the manipulation process, which is impacted by the structure and hierarchy of the manipulation commands, including the options associated with each command." Characteristics of "graphics/text manipulation" include modeling, drafting, graphing, editing, dimensioning, scaling, and layering features. "Excellent data exchange is evident when its occurrence is exceptionally well managed by the software to the extent that it is 'invisible' or 'transparent.'" Characteristics of "data exchange & integrity" include standards utilization, accuracy of imported data and geometric structure, and absence of any need for manual data reentry. Protest File, Exhibit 15F (emphasis added); see also Transcript at 165-66. 11. Grading required evaluators to make 3,771 binary scores on each offeror's production test and 45,543 binary scores on each technical features test. Protest File, Exhibits 8D, 8E, 8F at tables 3, 5 of each. It also required making approximately 320 numerical scores regarding productivity enhancement features shown on the production test and about 680 regarding such features on the technical features test. (Different numbers of entries were made for different offerors.) Id. at tables 2, 4 of each. Productivity enhancement features were graded by two evaluators; each recorded a score of zero, one, two, or three for each of the three stated subfactors. Id. at 7-9 of each. Because of the way in which productivity enhancement scores were calculated, the lowest possible score for productivity enhancement on a technical features test problem was 0.1; and a score of higher than zero would result for any discipline within the production test (geoprocessing, architecture, and civil, electrical, mechanical, and structural engineering) if either evaluator assigned any number other than zero to any subfactor for any session in which that discipline was at issue. Id. at tables 2-5 of each; Transcript at 153. 12. In calculating scores for each offeror, the Navy created a category called "technical excellence" which encompasses both technical and management scores. Protest File, Exhibit 8A at 85. The scores were weighted so that the LTD production test score was worth thirty-five percent of "technical excellence;" the LTD technical features test score was worth twenty-five percent; the technical proposal was worth fifteen percent; and the management proposal was worth twenty-five percent. Id., Exhibits 8A at 59; 8D, 8E, & 8F at 1 of each. For the production test, the maximum possible score for drawing content was 1979, for drawing content coordination was 1402, and for computation coordination was 390. Id., Exhibits 8D, 8E, & 8F at table 5 of each. An offeror's normalized total score for these subfactors was multiplied by its productivity enhancement grade to calculate the production test score. Id. On the technical features test, for each problem, an offeror's raw problem result score was divided by the maximum possible score to determine a "scaled problem result score." This last number was multiplied by the problem's weight and again by the offeror's productivity enhancement score to achieve a "problem score." The normalized total of the offeror's problem scores was the firm's score for this test. Id. at table 3 of each. 13. Centel, FCC, and Intergraph all responded to the solicitation. Protest File, Exhibit 8A at i. The presentation made by each of the three firms met all MMRs. Id. With reference to OTFs -- the scored portion of the presentations -- as calculated by the Navy, the firms were evaluated as having the following technical scores: Evaluation factor Centel FCC Intergraph Production test __%1 __% __% Technical features test __% __% __% Technical proposal __% __% __% Protest File, Exhibit 8A at 85. The Navy based its calculations on the assumption that a maximum of 937.5 points could be scored for these factors. Id., Exhibit 10A at 24. Using these numbers, the Navy's results are as follows: ____________________ 1 All figures used in this and subsequent tables are subject to rounding errors. Evaluation factor Centel FCC Intergraph Production test ___ ___ ___ Technical features test ___ ___ ___ Technical proposal ___ ___ ___ Total technical ___ ___ ___ Id. As can be seen, Centel's technical score is __ percent of Intergraph's and FCC's is __ percent of that firm's. 14. We have performed our own calculations of the offerors' LTD scores, using without change the raw scores reported by the evaluators. See Protest File, Exhibits 8D, 8E, 8F at tables 2-5 of each. These calculations show the following results. a. If each of the production test subfactors other than productivity enhancement is given equal weight, and productivity enhancement is given weight separate from but equal to the sum of the others, the production test scores would look like this: Evaluation subfactor Centel FCC Intergraph Drawing content __% __ __% __ __% __ Drawing content coord __% __ __% __ __% __ Computation coordination __% __ __% __ __% __ Subtotal __% ___ __% ___ __% ___ Productivity enhancement __% ___ __% ___ __% ___ Total __% ___ __% ___ __% ___ b. If the technical features test is calculated on the assumption that for each problem, the scaled problem result score and the productivity enhancement score should be separately multiplied by the problem weight, and that the two resulting amounts should be given equal weight, the technical features test scores would look like this: Evaluation subfactor Centel FCC Intergraph Problem results __% ___ __% ___ __% ___ Productivity enhancement __% ___ __% ___ __% ___ Total __% ___ __% ___ __% ___ c. If we now give the production test five percent more weight than the technical features test (51.2 percent, or 384 points, as opposed to 48.8 percent, or 366 points), and add these scores to the ones for the technical proposals, we calculate the following technical points: Evaluation factor Centel FCC Intergraph Production test Drawing content __ __ __ Drawing content coord. __ __ __ Computation coordination __ __ __ Subtotal ___ ___ ___ Productivity enhancement ___ ___ ___ Total ___ ___ ___ Technical features test Problem resolution ___ ___ ___ Productivity enhancement ___ ___ ___ Total ___ ___ ___ Technical proposal ___ ___ ___ Total technical ___ ___ ___ d. Making all these assumptions, Centel's technical score is __ percent of Intergraph's, and FCC's is __ percent of that firm's. Incorporation of the assumptions thus closes the gap between the protesters and Intergraph -- by __ percent in Centel's case and by __ percent in FCC's -- though Intergraph's technical proposal remains the best of the three. See Finding 13. Discussions 15. The Navy conveyed to offerors through the solicitation an intention to provide a significant amount of information about problems in their proposals and LTDs. In paragraph L25, "Discussions and Contract Award," the agency first said that "discussions will be held to resolve any technical deficiencies found in each offeror's written technical proposal or during an . . . LTD." The term "deficiency" was defined to be "[a]n offeror's failure to satisfy a minimum mandatory requirement." Protest File, Exhibit 4B at L-22 ( L25.1). The contracting officer, who wrote the paragraph, understood that "any technical deficiencies" meant all such deficiencies. Transcript at 833. Centel and FCC had this same understanding. Id. at 463-64, 675, 766. 16. Also in paragraph L25, the agency made the following statement: L25.2. Discussion of Technical Weaknesses Discussions will be held to inform each offeror of any weaknesses found in [its] written technical or management proposal or during an offeror's LTD. These weaknesses may be found during the [LTD] or during the evaluation of an offeror's written technical or management proposal. Weaknesses relate to scored portions of an offeror's proposal. Id. ( L25.2). Consistent with his understanding of the word "any" in the subparagraph regarding discussions of deficiencies, the contracting officer meant "any weaknesses" in L25.2 to connote all weaknesses. Transcript at 833. Again, the protesters shared his view. Id. at 463-64, 676, 766. 17. In Attachment 2 to the solicitation, the Navy said that if it noted a "significant technical inconsistency" during the LTD, and the inconsistency was "determined to be detrimental to the successful completion of the test, the [Navy] LTD spokesperson will notify the Offeror." Protest File, Exhibit 4B at AT2-8 ( 4.4). In addition, the agency said that "[i]f during the technical features test, the Offeror produces an incorrect numeric or graphic result [a zero where the offeror said it was demonstrating a capability], the Government will document the error as a deficiency if a MMR fails or as a discrepancy if either an OTF fails or the result is an error." Id. at AT2-11 ( 5.8). Certain inconsistencies in the production test were also to be identified for offerors. Id. at AT2-9 ( 4.5). 18. The term "weakness" is used only in subparagraph L25.2 and nowhere else in the solicitation. It is not defined in the document. See Transcript at 220, 832. The contracting officer defined "weakness" to mean "a failure to earn points in a technical or management area." Respondent's Exhibit 3; Transcript at 836. This definition was never conveyed to the offerors. Transcript at 236, 837, 918. The Navy believed that it was implementing the definition by notifying offerors of the following instances: (a) when a zero was scored for any item in technical and management proposals; (b) during LTDs, when any binary score was a zero; and (c) during LTDs, when any productivity enhancement score was a "triple zero." The last term means that each of two evaluators determined that the offeror's showing of productivity enhancement features during a session was unsatisfactory with regard to each of the three subfactors -- man-machine interaction, graphics/text manipulation and data exchange & integrity. Transcript at 224-37. The Navy's SSEB chairman testified that for the Government to inform an offeror that it had scored in an unsatisfactory fashion on any particular test, or with regard to a productivity enhancement feature subfactor, would not have been "technical leveling," as that term is defined at 48 CFR 15.610(d) (1991). Id. at 322-23. 19. To alert each offeror to problems in its presentation of the LTD, Navy provided an "LTD Deficiency/Discrepancy Report Form" (DDR) with regard to each failure to meet an MMR or "incorrect numeric or graphic result" found by agency evaluators. Transcript at 898. DDRs for one test session were generally given to an offeror before the next session. Id. at 895, 904-06. The number of DDRs provided to the offerors is large; we have not counted them, but we have measured that Centel and FCC were each given a stack of forms about two and one-half inches thick. Protest File, Exhibits 5A, 5B. DDRs did not note weaknesses in productivity enhancement features, however. Transcript at 143. An offeror could not identify weaknesses related to its productivity enhancement features based on the information provided on a DDR, because, as explained by FCC's LTD manager, a "productivity enhancement feature is how do you do it, and a discrepancy is what did you do." Id. at 440-45, 484-85, 690-91. 20. The Navy also informed each offeror, during the LTD, of certain technical inconsistencies -- primarily, whether the offeror was "going down the wrong path" and not doing constructive work. Transcript at 106-07, 240; see Protest File, Exhibit 4B at AT2-8 to AT2-9 ( 4.4). This form of notice did not relate to the quality of productivity enhancement features being demonstrated. Transcript at 240-41. 21. The solicitation permitted an offeror to make repairs (including program corrections or reprogramming) and take retests during the LTD. Protest File, Exhibit 4B at M-3 ( M2.3.3), AT2-11 ( 5.8, 5.9), AT2-12 ( 6.7). Upon receiving a DDR, an offeror could request a retest for the purpose of passing a failed MMR or to receive a "correct" scored result. Transcript at 900-02. The only limitations on reprogramming and repairs during the LTD for the purpose of improving scores were that the modifications had to have been made before the LTD was completed and had to have been observed by agency evaluators. Id. at 28. Productivity enhancement features could be reevaluated on a retest. Id. at 901. 22. Prior to the LTDs, the Navy informed each offeror that it would not be informed of its productivity enhancement scores during the LTD. Transcript at 337, 527, 706-07, 922, 969-70. The agency never stated that it would provide no information about weaknesses in productivity enhancement features, however. Id. at 228-30, 241, 337, 461, 466, 720, 765. 23. During the evaluation of productivity enhancement features at the LTD, Navy personnel recorded notes on an evaluation form relating to the features they observed. Immediately after the conclusion of the test session, the evaluators reviewed their notes and assigned scores for each of the three productivity enhancement subfactors, along with justifications for the scores. The scores were then recorded in a log prior to the beginning of the next test session. Transcript at 85-86, 112, 416-17, 881-83, 956-57. The evaluators, in accordance with their instructions, did not discuss with offerors the merits of productivity enhancement features that had been demonstrated; communications were limited to questions of clarification. Id. at 142-44, 347-48, 413-15, 419-20. The Navy provided only one other type of feedback to offerors about their demonstrations of these features at the LTDs. This was that the Navy's chief technical evaluator would occasionally give generalized oral reminders that the vendor needed to show "productivity enhancement" as it performed the production and technical features tests. Id. at 91-92, 911-12. 24. According to instructions given to evaluators of productivity enhancement features, among the four possible scores (zero through three, see Finding 9) which could be assigned for each subfactor, a rating of two was necessary to "be considered meeting the RFP requirements." Protest File, Exhibit 15F at PE-5. In making recommendations relative to the Navy's award decision, the SSEB and the SSAC both considered low scores on productivity enhancement features to be weaknesses relative to requirements of the solicitation. Protest File, Exhibits 10A at 21-22, 14A, 14B; Transcript at 285-86, 289-93. 25. Centel received a score of zero or one from at least one evaluator as to ___ productivity enhancement subfactors, such that the firm's score for that subfactor was less than two. Protest File, Exhibit 8E at tables 2, 4. FCC received ___ such scores, and Intergraph received ___. Id., Exhibits 8D, 8F at tables 2, 4. A "triple zero" was achieved only once -- by __________. Id., Exhibits 8D, 8E, 8F at tables 2, 4 of each, and particularly Exhibit 8F at table 4; see Finding 18. 26. We have recalculated the LTD scores for the purpose of assessing prejudice that might have resulted from the lack of discussions about productivity enhancement features (see Discussion, below). Specifically, we have recalculated the scores for the protesters on the assumption that for each productivity enhancement subfactor for which the offeror received a score of less than two, a retest would have produced a score of three. Taking as a base the scores shown in the tables at Finding 14(a) and (b), the productivity enhancement scores for Centel would be __ percent of the maximum possible on the production test and __ percent on the technical features test; and for FCC, they would be __ percent and __ percent, respectively. Multiplying these percentages by the points available as per the assumptions governing the table at Finding 14(c), and keeping Intergraph's scores constant, the result is as follows: Evaluation factor Centel FCC Intergraph Production test Production results ___ ___ ___ Productivity enhancement ___ ___ ___ Total ___ ___ ___ Technical features test Problem resolution ___ ___ ___ Productivity enhancement ___ ___ ___ Total ___ ___ ___ Technical proposal ___ ___ ___ Total technical ___ ___ ___ Thus, if each offeror had been informed of all its low productivity enhancement scores, and had improved those scores to a top rating, Centel's technical score would have been __2 percent higher than Intergraph's, and FCC's would have been __ percent higher.3 27. According to testimony by their proposal managers and LTD team leaders, if Centel and FCC had been informed of their low productivity enhancement scores during the LTD, they could have taken several kinds of corrective action. They could have modified their approach for the remainder of the LTD to have their workstation operators explain more about the actions they were taking, and to use more of the mouse-based graphical user interface features and macros that were inherent in their systems. The offerors could have requested retests to do these same things with regard to tests on which they had scored poorly. They could have modified software programs to increase the number of features that could be shown, or possibly even made small substitutions in software. Transcript at 482-85, 686-87, 691-92, 727-28, 756-58, 772-73. Even if an offeror had been told after its LTD about low productivity enhancement score, protester witnesses testified, the firm could have lowered its price to compensate for the weakness in its technical proposal. Id. at 480, 533-34, 775. (Centel and FCC each set its price on the belief that because it had not been notified of weaknesses in its LTD productivity enhancement scores, it had done well in that area at its LTD. Id. at 472-77, 533, 775; FCC Exhibit 6.) ____________________ 2 Here and in a few other places in this opinion, extra spaces have been inserted in the printing so that when redactions are made to protect source-selection-sensitive and proprietary information, the version of the opinion which is released to the public does not reveal information which ought to remain confidential. 3 If Intergraph's LTD were rescored in the same manner, changing all zero and one productivity enhancement scores to threes, Intergraph's results would be as follows: production test -- productivity enhancement, ___; total, ___; technical features test -- productivity enhancement, ___, total, ___; total technical factor score, ____. Award decision 28. The solicitation provided that a contract would be awarded to the offeror which "provides the greatest value, price and other factors considered, to the Government." Protest File, Exhibit 4B at M-3 ( M2.6). The solicitation gave the SSA, in making the award decision, the right to "balance the technical merits of each proposal against the proposed overall cost to determine the greatest value to the Navy." The SSA was "not [to] be strictly bound by point scores;" he had the discretion to award to a lower-priced proposal with less technical excellence, or to a higher-priced proposal with greater technical excellence. Id. at M-23 to M-24 ( M8). 29. The solicitation did not envision that the SSA would make his decision in a vacuum, however; it also prescribed the structure for the evaluation which would inform his judgment: Proposals will be evaluated using three factors: 1) Technical, 2) Management, and 3) Price. Technical will account for 75 percent of the score and management will account for 25 percent. Although price will not be mathematically scored, price is a significant factor in the evaluation. Technical is the single most important factor and management is the least important factor. Protest File, Exhibit 4B at M-4 ( M3). 30. The scoring of technical presentations has been fully described above. The Navy calculated that Intergraph's presentation was the best, with Centel's and FCC's rated at __ and __ percent as good, respectively. Finding 13. We have made several calculations as to what the scores might have been if different assumptions had been made. Our final calculation was that if various factors had been assigned certain weights and discussions as to low productivity enhancement scores had enabled each protester to maximize those scores, Centel's and FCC's technical presentations would each have edged past Intergraph's in points. Finding 26. 31. Management proposals were scored on a multipoint basis under which, with regard to each of the stated specifications, an offeror received a zero for "feature is unsatisfactory or not offered," one for "meets the criteria," two for "exceeds the criteria," or three for "feature is outstanding." Protest File, Exhibit 4B at M-10a ( M3.2.2). A maximum score of 312.5 was available for a management proposal. Id., Exhibit 10A at 24. Scores assigned were ___ for Centel, ___ for FCC, and ___ for Intergraph. Id., Exhibits 8A at 85, 10A at 24. Thus, Intergraph's management score was the highest, and Centel's and FCC's were __ and __ percent of Intergraph's, respectively. 32. Prices proposed by the offerors were evaluated on a present value, life cycle cost basis. All three offerors proposed prices which, on this basis, were between $___ million and $___ million. Centel's evaluated price was $___,___,___; FCC's was $___,___,___; and Intergraph's was $___,___,___. Protest File, Exhibit 8A at 96-110. As can be seen, ______'s price was the lowest, and ______'s and Intergraph's were $__,___,___ and $__,___,___ (or __ and __ percent) higher than ______'s, respectively. 33. In preparation for the source selection decision, the SSEB made a "productivity analysis" which purported to assess the relative value of the three proposals. Protest File, Exhibit 8A at 122-32. The analysis makes passing reference to "technical excellence" scores, which combine technical and management evaluation results. These scores were as follows: Centel, ___ (__._ percent of maximum); FCC, ___ (__._ percent); and Intergraph, ___ (__._ percent). Based on these numbers, the SSEB concluded the difference between Intergraph's technical excellence score and Centel's was __._ percent, and between Intergraph's and FCC's was __._ percent. Id. at 125. 34. The productivity analysis focuses on two other items, however -- productivity enhancement scores and labor costs. The SSEB and the SSAC chairpersons both considered that the former was the single most important element of the analysis. Transcript at 268, 548. The SSEB calculated a "productivity enhancement grade" for each offeror, based on scores achieved for each of the three productivity enhancement subfactors (man- machine interaction, graphics/text manipulation, and data exchange & integrity) during each of the two LTD tests. Centel's grade was ____ percent; FCC's was ____ percent; and Intergraph's was ____ percent. The evaluation board concluded that the difference between Intergraph's productivity enhancement grade and Centel's was ____ percent, and between Intergraph's and FCC's was ____ percent. Protest File, Exhibit 8A at 124. 35. The SSEB then made various assumptions about the pay grades of the individuals who would be using the workstations to be supplied under the contract resulting from this procurement, the rate of inflation, and the amount of time an average workstation would be in use. Using what the evaluation board believed were conservative estimates, the cost of labor associated with these workstations, over the twelve-year life of the contract, was estimated to be $___,___,___. The SSEB stated, "The magnitude of the labor cost amplifies the importance and benefits of the productivity of the systems. Based on this analysis, every 1% increase in productivity yields approximately a $_________ savings in labor costs." Protest File, Exhibit 8A at 125-27. 36. The SSEB's next step was to divide the cost difference between each set of offerors' prices by one percent of the twelve-year labor cost ($__________). The evaluation board found these results: the difference between FCC's price and Centel's yields a result of ____; between Centel's price and Intergraph's, ____; and between FCC's price and Intergraph's, ____. The SSEB concluded that for one offeror to offset another's price advantage, it would need to have a technical proposal that would result in a "productivity increase" of at least these last calculated results. Protest File, Exhibit 8A at 128. 37. The SSEB understood that "[t]he productivity enhancement evaluation was not intended to be an absolute measure of productivity, but rather a relative measure," or "a relative indicator[] of the expected productivity of the offered solutions." Protest File, Exhibit 8A at 124, 131. The evaluation board compared the differences between productivity enhancement grades, see Finding 34, with the differences in productivity thought necessary to offset price disparities, see Finding 36. It concluded that because Intergraph's grade advantage over both Centel and FCC was so much greater than the necessary number (____ versus ____; ____ versus ____), "meaningful differences exist between the offered systems and . . . the impact of the use of the system is very significant." Protest File, Exhibit 8A at 132. 38. The SSAC, like the SSEB, recognized that a one-to-one relationship does not exist between productivity enhancement grades and actual productivity of a CAD system. Transcript at 540-41, 1285. Nevertheless, the SSAC chairperson thought that the grade "gives the average person a good gauge of what productivity could be expected from [a] solution." Id. at 580. She testified, "[T]he effect of the scoring gave us a very high confidence level that the real productivity would be achieved by at least the ____ percent that we stated was necessary to offset the cost difference between the high and low [rated] tech[nical] proposals." Id. at 540; Protest File, Exhibit 11 at 5. The SSA's decision to award the contract to Intergraph was based in large measure on this same analysis. Transcript at 1290-96. 39. The Navy considered that the differences in scores of the three offerors' management proposals were insignificant. Management was therefore not a significant discriminator in the source-selection decision. Transcript at 557, 1296. 40. The SSEB and SSAC chairpersons both understood that price could be given a degree of importance, as required by the solicitation, whether it was scored or not. Transcript at 208- 09, 212, 567. The SSAC chairperson understood that offerors' prices could be compared in some sort of mathematical way -- such as by saying that one price is a percentage of another -- although they could not be scored against a preexisting scale. Id. at 1321-24. In performing the analysis of the relative merits of the offerors' proposals, however, the Navy did not apply any mathematical relationship of price to technical and/or management factors. Id. at 581-84, 1322-23. Nor could Intergraph's expert witness discern from the agency's statements the relationship that was applied. Id. at 1265-68. 41. Intergraph's expert witness conceded that if price is assigned a value somewhere between technical merit and the worth of management proposals, even if no scores are changed, Intergraph is not highest rated over the entire range of values. Transcript at 1214, 1265-68; Intergraph Exhibit 1 at 34. 42. The Navy never attempted to quantify the productivity gain associated with any offeror's proposed solution. Transcript at 538, 543. Indeed, the technical evaluation team leader acknowledged that in measuring productivity enhancement features, the Government was not measuring productivity. Id. at 193. 43. As explained by Centel's expert witness, the productivity associated with a computer system is dependent on the amount of time taken to perform a job and the quality of the performance. Transcript at 595. More specifically, the amount of time is dependent on the nature of the job, the nature of the user, the hardware and software tools available, and the procedures that are required to work with the tools. Id. at 612. The expert's testimony was that the number of tools in a system, standing alone, has no necessary correlation to the time factor. Id. at 595, 598. 44. The Navy understood that speed of completion of a job is a critical factor in assessing the productivity associated with a particular system. Protest File, Exhibit 15F at PE-1. Nevertheless, the agency never measured time, either directly or indirectly, as part of the evaluation of productivity enhancement features or the productivity analysis. Transcript at 394. The Navy simply assumed a relationship between the number of productivity enhancement tools in a "toolbox" and speed of operation. Id. at 184. 45. When asked to give an example of a productivity enhancement feature, Intergraph's proposal manager mentioned the "place text" command. Intergraph, to demonstrate in the LTD its storehouse of tools for putting text on the computer screen, showed that it could place words variously between two points, on a line, on an arc, on an angle, at one edge of the screen, or in different sizes. The command may have been demonstrated ten times in different ways. Transcript at 992-96. As explained by Centel's expert witness -- [T]he goal of the CAD operator is not to do 3-D modeling; the goal of the CAD operator is to design a building. If you just have . . . the vendors run through all the various ways they have of doing 3-D modeling on their system or all the various modeling types when it's not particularly appropriate to the job being done, then you get a really false picture of how productive the system will be when it's actually used for that job. Id. at 1337-38. 46. The expert explained further that even if one vendor proposes a particular tool which is superior to another vendor's tool, the better tool will not lead to more productivity unless it can profitably be put to use in performing a particular task. For example, if vendor A offers a better mouse than vendor B, but the task involves creating text, using the mouse would be vastly less efficient than using a keyboard, so the rational user would employ a keyboard and vendor A's offering of the better mouse would have no impact on the productivity of either vendor's system. Transcript at 648. Similarly, FCC's LTD manager testified, "[P]roductivity enhancement features or factors are dependent very much on an individual. What is a productivity enhancement feature for one person is a drag for somebody else." Id. at 686. 47. The Navy's CAD 2 program manager understood that some productivity enhancement tools (such as compressed keystrokes) are especially useful to experts, whereas others (such as long menu trees) benefit novice users of a CAD system. Transcript at 384-85. Intergraph's expert witness similarly believed that the items within the productivity enhancement subfactor called "man- machine interaction" would be of importance primarily to novices. Id. at 1262-63. 48. The SSAC chairperson and her council believed, in reviewing the SSEB's productivity analysis, that productivity enhancement features were especially important because "there were largely untrained users that would be using these systems." Transcript at 1285. The instructions to evaluators, however, stated that a "representative Government operator" "has used CAD systems for several years, has worked with more than one CAD or CAE system, and generally has had training on multiple systems. [He is] generally [a] frequent user[] of CAD systems and CAE application software in accomplishing assigned engineering tasks." Protest File, Exhibit 15F at PE-1. The productivity analysis was based on the assumption that the typical user will use the system an average of four hours each day. Id., Exhibit 8A at 126. Discussion Perhaps the most frustrating aspect of this case is that in the subject procurement, the Government appears to have done all the hard things well, but has failed in interpreting some relatively simple rules that it set for itself. This procurement is for the acquisition of complex computer hardware and software. Finding 1. It has consumed more than two years. See id. The Navy made an enormous effort to describe the items it was interested in receiving, developing 418 solicitation pages containing thousands of specifications. Finding 2. The procurement was specially designed to emphasize a live test demonstration (LTD) in which the capabilities of the offerors' systems, encompassing those products, would do "real work." Findings 2, 4. The LTDs were comprehensive, involving the assignment of about fifty thousand scores by Government evaluators. Finding 11. The Navy cranked out hundreds of statements to offerors about problems they encountered in their demonstrations. Finding 19. The agency carefully evaluated proposals through the use of a Source Selection Evaluation Board (SSEB) and Source Selection Advisory Council (SSAC), and the Source Selection Authority (SSA) considered the recommendations of these panels before making his award decision. Findings 3, 28-38. Despite all the planning and all the effort, however, the conduct of the procurement has been irretrievably compromised by unreasonable interpretations of solicitation provisions, which led to improper decisions regarding LTD scores and inadequate discussions. In addition, the process of assessing the relative merits of proposals, which was essential to selecting an offeror for award, ignored the strictures of the solicitation and was prejudicially flawed. Scoring of technical aspects of proposed systems Centel and FCC allege that the Navy made three mistakes in assigning weights to scores achieved by the offerors in running their LTDs. In each instance, protesters maintain, the agency deviated from the scoring schema it set forth in the solicitation. The Board's governing principle in reviewing such contentions is that where an offeror does not object to a provision of a solicitation, it must be content with any reasonable interpretation of that provision which promotes full and open competition. Xerox Corp., GSBCA 9862-P, 89-2 BCA 21,652 at 108,922, 1989 BPD 68, at 20. In appraising the reasonability of an interpretation, we read the solicitation as a whole and avoid making any one portion of it superfluous. Lewis Associates, Inc., GSBCA 10352-P, 90-1 BCA 22,541, at 113,112, 1989 BPD 395, at 4-5; Rocky Mountain Trading Co. - Systems Div., GSBCA 9737-P, 89-1 BCA 21,456, at 108,121, 1988 BPD 307, at 5; Hughes Advanced Systems Co., GSBCA 9601-P, 88-3 BCA 21,115, at 106,602, 1988 BPD 185, at 8. We ascribe no significance to the facts that the scoring schema was devised before the solicitation was written, and that in composing the document, the agency was attempting to explain what it already had in mind. The agency's interpretation must be reasonable in light of the words of the solicitation because the offerors have no insight other than that paper into the agency's intent. They prepare their proposals in the expectation that those efforts will be judged against certain standards, and the agency is required to apply those standards in the evaluation process. 10 U.S.C.A. 2305(b)(1) (West Supp. 1992); 48 CFR 15.608(a) (1991) (FAR 15.608(a)). A. Protesters first complain that with regard to one of the two parts of the LTD, the production test, the Navy assigned improper weights to three of the subfactors. The solicitation stated, "The drawing content, drawing content coordination and the computation coordination subfactors are of equal importance." Finding 7. As the LTD was actually conducted, the maximum possible score for drawing content was 1979, for drawing content coordination was 1402, and for computation coordination was 390. Finding 12. In rating the three subfactors, the Navy did not distinguish among them; it simply added these numbers together to form a total. Id. The agency now insists that its practice was permissible because the solicitation also explains that each subfactor "consist[s] of a number of graphic and numeric results," which are themselves "of equal weight." See Finding 7. The word "equal" means "of the same measure, quantity, amount." Corporate Jets, Inc., GSBCA 11049-P, 91-2 BCA 23,998, at 120,119, 1991 BPD 111, at 19. The Navy's interpretation gives life to one use of the word in the solicitation (as to graphic and numeric results), but at the expense of the other use (as to the subfactors themselves). As the agency actually scored proposals, drawing content was 5.3 times as important as computation coordination, and drawing content coordination was 3.8 times as important as that subfactor. This interpretation is clearly unreasonable; it constitutes a violation of statute and regulation. Id. B. Protesters next contend that the Navy obligated itself, in the solicitation, to give the three subfactors described above weight equal to that given the fourth production test subfactor, productivity enhancement features.4 Centel and FCC point to the sentence which states, "The productivity enhancement subfactor is of the same inportance [sic] as the combination of the other subfactors, and is applied to the other subfactors to obtain the overall production test score." See Finding 7. As the test was actually scored, the Navy multiplied the score for the first three subfactors by the score for productivity enhancement to calculate the total for this part of the LTD. Finding 12. In maintaining that this action was reasonable, the Navy and Intergraph focus on the second portion of the solicitation sentence -- one score "is applied to" the other to determine a total. Does "is applied to" connote addition or multiplication? Intergraph thinks there is no question: it "unambiguously denotes multiplication." Intergraph's Posthearing Brief at 22. The Navy is somewhat less certain; it says that the phrase "commonly designates a multiplication exercise." Navy's Posthearing Brief ____________________ 4 A description of "productivity enhancement features" is set out at Finding 10. Further discussion of what these features really are is at Findings 43-47 and below, under the heading "Award decision." at 29. We have searched Webster's Third New International Dictionary 105 (1986) for assistance in this matter, but cannot find that any of the many definitions of the word "apply" is particularly helpful. The word is not defined in the Federal Acquisition Regulation (FAR), 48 CFR ch. 1 (1991). No party has presented any expert testimony as to a special, generally understood use of the word among mathematicians. By looking at the entire solicitation, however, we are able to determine that any interpretation of the word that conveys any meaning other than addition is unreasonable. The concept of the solicitation is that the scored portions of offerors' presentations, technical and management, will be evaluated such that the LTD is worth sixty percent of the total score and written proposals are worth forty percent. Findings 4, 29.5 Under the Navy's scoring structure, the LTD came to be worth only forty-nine percent of the total, and the written proposals fifty- one percent. See Findings 13, 31. If the scoring is revised so as to combine productivity enhancement and other subfactors by addition, rather than multiplication, the result is that the LTD is worth fifty-six percent and the written proposals forty-four percent.6 See Findings 14(c), 31. The second analysis comports far more closely with the agency's intent in evaluating the merits of the various offerings. The reason for the difference is that in multiplying a fraction of a maximum possible score by another fraction of another maximum possible score, the two scores become mutually dependent and any deficiency in either decreases the importance of the other. The result is to diminish the value of the LTD and increase the other scored portions of an offeror's presentation, which apparently were figured on an additive basis. Perhaps this example will aid in understanding the difference in calculation. Suppose one hundred points were available on the production test, and an offeror scored ninety percent of maximum on each of the drawing content, drawing content coordination, and computation coordination subfactors, but only twenty percent of maximum on the productivity enhancement subfactor. By the Navy's way of thinking, the vendor's score for the production test would be 0.9 times 0.2 times 100, or eighteen. If instead the two portions of the test are given independent importance, the score is 0.9 times 50, or 45, plus 0.2 times 50, or 10. The total is fifty-five. ____________________ 5 Technical is worth seventy-five percent and management twenty-five percent. The LTD is to represent eighty percent of the technical score, and the technical proposal is to represent twenty percent. Thus, the LTD is sixty percent (0.8 times 0.75) of the total; the technical proposal is fifteen percent (0.2 times 0.75); and the management proposal is twenty-five percent. 6 This calculation takes into account the same type of revision of scoring for the technical features test. See D, ___ below. In our judgment, the solicitation in no way puts offerors on notice that the production test will be scored in the Navy's manner, making a competitor score high on both portions of the test in order to do well on any of it. C. Third, protesters call to our attention the disparity between the solicitation's statement that "[t]he production test score is slightly more significant than the technical features test score" and the Navy's weighting of the former as thirty-five percent of "technical excellence" and the latter as twenty-five percent. Findings 6, 12. ("Technical excellence" means the sum of technical and management scores. Finding 12.) Protesters say that thirty-five percent is forty percent more than twenty-five percent. The Navy and Intergraph say that it is only ten percent more, and that ten percent is a "slight" difference. The Navy and the awardee have trouble with their mathematics. The distinction they draw between the two numbers is simply wrong; "slightly more significant" indicates a comparison, and the Navy's math has no regard for the relative weights of the two tests. A little knowledge of baseball might help one to appreciate why. If considering American League infielders without much power, no one would quarrel if told that Manuel Lee (.254) has been a "slightly" better hitter over his major league career than Billy Ripken (.244) has been. The difference between the two averages is four percent. Some people might also agree that Ozzie Guillen (.266) has been "slightly" better than Ripken. The difference between their two averages is nine percent. But not even Cal Ripken, Sr., would say that Wade Boggs (.338, even after the 1992 season) has been only "slightly" better than his son Bill. Boggs' average is thirty-nine percent higher; though it is "only" .094 (or 9.4 percent, in the Navy's terms) better than Ripken's, he is an altogether different breed of hitter. See USA Today, Oct. 6, 1992, at 13C; The Sporting News Official Baseball Register 1992 47-48, 187-88, 278, 409 (1992). In an earlier protest decision, we held that a difference of thirty-three percent between two factors was inconsistent with the solicitation's statement that the first was worth "slightly more" than the second. Arthur Andersen & Co., GSBCA 8870-P, 87-2 BCA 19,922, at 100,808, 100,815, 1987 BPD 94, at 6-7, 18. In differing site condition cases, figures as low as twenty percent have been found to be materially different from one another. Cherry Hill Construction, Inc. v. General Services Administration, GSBCA 11217 (June 9, 1992), slip op. at 16-17. We do not decide whether a ten percent difference between factors would make one "slightly more significant" than the other; that is not the issue presented here. Rather, we are asked to decide whether a forty percent difference is "slight." We have no difficulty in answering that in the negative. It is sufficiently great that a reasonably prudent offeror might have constructed an altogether different proposal if it had known of the agency's intentions in advance. D. Neither Centel nor FCC has asked us to review the reasonability of the scoring of the technical features test itself. We explore this matter, however, to establish the basis on which the award decision should have been made. The solicitation says that for this test, "The correct results produced and the demonstrated productivity enhancements are of equal importance for each test problem" for which productivity enhancement is scored. Finding 8. As for the production test, the Navy multiplied, rather than added, the two subfactors. Finding 12. For the same reasons that we found multiplication unreasonable in the production test, we find that it was inconsistent with the solicitation terms in the technical features test as well. E. Both protesters allege in their complaints that the Navy made "plain errors" in its scoring of the firms' LTDs. Centel Protest, 15, 16; FCC First Amended Protest, 49, 51, 52. No evidence has been presented as to these contentions, and at hearing, Centel expressly disclaimed pursuing them. Transcript at 823-24. Discussions The FAR directs that in discussions, the contracting officer shall "[a]dvise the offeror of deficiencies in its proposal so that the offeror is given an opportunity to satisfy the Government's requirements." FAR 15.610(c)(2). A "deficiency" is by regulatory definition "any part of a proposal that fails to satisfy the Government's requirements." FAR 15.601. The solicitation written by the Navy for this procurement was entirely consistent with these rules. It said that discussions would be held to resolve "any technical deficiencies found . . . during an . . . LTD." Finding 15. By "any," the Navy meant, and the offerors understood, that the agency would engage in discussions about all deficiencies. Id. Finally, a "deficiency" was "a failure to satisfy a minimum mandatory requirement." Id. Protesters raise no objection to the conduct of discussions insofar as they related to deficiencies. The Navy imposed on itself in the solicitation, however, an additional burden regarding discussions. Discussions were also required to "be held to inform each offeror of any weaknesses found . . . during an offeror's LTD." Finding 16 (emphasis added). The solicitation said that weaknesses "relate to scored portions of an offeror's proposal. Id. As with deficiencies, "any" was supposed to mean that all weaknesses were discussed. Id. Any notification of a weakness during an LTD was important because offerors were permitted to make repairs (including program corrections or reprogramming) and to take retests during the sessions. Finding 21. With the benefit of information about weaknesses, therefore, an offeror could improve its scores and thus enhance its chances of being selected for contract award. Id. Centel and FCC contend that they were not informed of weaknesses that the Navy perceived in the productivity enhancement features demonstrated by protesters in the LTDs of their proposed systems. The record is clear that although the Navy gave offerors significant amounts of information as to poor scores on other aspects of the tests, it provided precious little news as to the display of productivity enhancement tools. Findings 19, 20, 23. The Navy and Intergraph essay several defenses to the allegation, however. The Navy first contends that its pre-LTD statements to offerors, that vendors would not be informed of their LTD productivity enhancement scores, was a clear indication that weaknesses in productivity enhancement features would not be the subject of discussions. See Finding 22. Thus, the Navy says, the protests are untimely as to this matter. See Rule 5(b)(3)(ii). "No scores" and "no weaknesses" are very different things, however. Scores were assigned on a multipoint basis; they could be zero (unsatisfactory), one (marginal), two (satisfactory), or three (excellent). Finding 9. "No scores" simply means "no numerical values;" it does not mean that no information would provided about productivity enhancement evaluations. No statement was ever made limiting the solicitation's announcement that all weaknesses would be discussed. Finding 22. Next, the Navy and Intergraph put great emphasis on the Board's oft-repeated standard that the conduct of discussions is within the discretion of the procuring agency, and the agency's "judgment in such matters must be respected and sustained unless clearly defective." Genasys Corp., GSBCA 8734-P, 87-1 BCA 19,556, at 98,848, 1986 BPD 224, at 11; see also, e.g., CACI, Inc. - Federal, GSBCA 11523-P, 92-1 BCA 24,702, at 123,378, 1992 BPD 4, at 10; Health Systems Technology Corp., GSBCA 10920-P, 91-2 BCA 23,692, at 118,642, 1991 BPD 20, at 12; Advanced Technology, Inc., GSBCA 8878-P, 87-2 BCA 19,817, at 100,272, 1987 BPD 67, at 33; DALFI, Inc., GSBCA 8755-P, 87-1 BCA 19,552, at 98,806, 1986 BPD 228, at 22, reconsideration denied, 87-1 BCA 19,584, 1987 BPD 15. We have specifically held that this principle is applicable where the agency is required to discuss weaknesses as well as deficiencies. Diversified Systems Resources, Ltd., GSBCA 9493-P, 88-3 BCA 21,017, at 106,171, 1988 BPD 154, at 10. We respect the principle and adhere to it in this case. We conclude, however, that the Navy's judgment in providing virtually no information about weaknesses in demonstrations of productivity enhancement features was clearly defective. A principal difficulty in scrutinizing this matter is that although the term "weaknesses" is of great importance, it does not have an obvious meaning. A weakness is clearly a problem less severe than a deficiency, but more particularly, what is it? The solicitation itself not only does not define the term, but also does not even use it more than once, so that a meaning could be ascertained from context. Finding 18. Nor does the FAR provide a definition. The dictionary says only that a "weakness" is "something that is a mark of lack of strength or resolution." Webster's Third New International Dictionary 2589 (1986). Centel asserts in its posthearing brief, at 10, that the meaning of the term can be gleaned from some decisions of the Comptroller General; we have reviewed these decisions, however, and do not find them helpful. Clearly, we would like to be able to be more precise in specifying a meaning than to say, as Mr. Justice Stewart did about hard-core pornography, "I know it when I see it." Jacobellis v. State of Ohio, 378 U.S. 184, 197 (1964) (concurring opinion). Without better assistance, though, we will have to focus on the meaning of the term in this specific procurement. For that, we focus on the Navy's actions in interpreting the word. With regard to all aspects of scored portions of offerors' presentations other than LTD productivity enhancement features, the Navy considered a score of zero to be a weakness. Finding 18. Other parts of the LTDs were scored in such a way that the only possible results were zero or one; a zero signified that although the offeror believed it was demonstrating a capability, the Navy found that it was not doing so. Findings 9, 17. Management proposals were scored differently; a zero meant "feature is unsatisfactory or not offered," one meant "meets the criteria," and two or three was assigned when the item exceeded the criteria. Finding 31. In all of these areas, then, any score above zero met the agency's specification. With regard to LTD productivity enhancement features, evaluators were instructed to assign a score of less than two when the offeror did not "meet[] the RFP [request for proposals] requirements." Finding 24.7 These low scores were later considered weaknesses relative to solicitation requirements by the SSEB and the SSAC. Id. We consequently conclude that by the standards the Navy set and actually used for evaluating LTDs, any score of less than two for a productivity enhancement feature was a weakness. See Health Systems Technology Corp., 91-2 BCA at 118,642, 1991 BPD 20, at 11 (marginal acceptability warrants discussions). Judged against this guide, the agency's decision to limit the definition of productivity enhancement weakness to a score of "triple zero" in a test session, Finding 18, was arbitrary and an abuse of discretion. Indeed, it is especially egregious given that the number of productivity enhancement scores was so much smaller than the number of other LTD scores, for a single low score on productivity enhancement had a much more debilitating effect on ____________________ 7 The Navy contends, in its posthearing brief (at 23), that it "considered any points earned in the area of productivity 'enhancement' as an indication of strength rather than a weakness." This statement is completely at variance with the facts, and the transcript pages the Navy references contain no statements in support of the agency's assertion. the total score than did a single low other score. See Finding 11. Another defense attempted by the Navy is that the provision of information to offerors about weaknesses in their demonstrations of productivity enhancement features would have been "technical leveling." This is a proscribed activity which is defined as "helping an offeror to bring its proposal up to the level of other proposals through successive rounds of discussion, such as by pointing out weaknesses resulting from the offeror's lack of diligence, competence, or inventiveness in preparing the proposal." FAR 15.610(d). The short answer to this contention is that the Navy's own SSEB chairman admitted on the witness stand that simply informing an offeror of instances in which it had achieved a low score on productivity enhancement would not be technical leveling. Finding 18. This testimony is consistent with the agency's unchallenged view that informing a vendor of instances in which it had achieved a low score on any other area of the LTD was not technical leveling. See id. Conveying the fact that a productivity enhancement feature was evaluated as less than satisfactory would clearly not be the kind of action regarding weaknesses that involves leveling. Further along this line, the Navy notes that in some protests, we have found reasonable, in avoiding the possibility of technical leveling, agencies' decisions not to advise offerors of weaknesses in their proposals. Orkand Corp., GSBCA 11405-P, 92-1 BCA 24,624, at 122,831, 1991 BPD 320, at 10-11, aff'd, No. 92-1077 (Fed. Cir., Oct. 9, 1992); OAO Corp., GSBCA 10186-P, 90-1 BCA 22,332, at 112,235, 1989 BPD 296, at 10. These cases are not applicable to the procurement now before us. They involve procurements for computer support services, in which the inadequacies were in offerors' responses to questions that sought solutions to hypothetical task assignments. The agencies were attempting to gauge the firms' ability to staff particular situations; the Government was buying management capabilities, and to understand the differences among them, it was testing initial reactions. In the instant procurement, however, the Government is acquiring computer systems with various technological capabilities. Those capabilities are present in a system or not (or can be made to be present or not), regardless of whether they were actually demonstrated at LTDs. The main purpose of advising as to weaknesses was consequently to permit offerors to show more of what their systems could already do, so that the agency might get a fuller look at its alternatives. In such a situation, giving notice of weaknesses is not technical leveling. Genasys, 87-1 BCA at 98,849, 1986 BPD 224, at 12. Most of the other defenses offered to this protest count revolve around logistical difficulties. The Navy and Intergraph maintain that the Navy did not know, during the LTDs, how offerors were scoring with regard to productivity enhancement features; that because scoring was subjective, the agency had no basis for the evaluators' conclusions; that the magnitude of the effort needed to caution about weaknesses in this area would have been administratively unmanageable; and that additional information would have distracted offerors from performance of the LTDs as a whole. These arguments are unconvincing. The agency knew very well, during the LTDs, how scoring of productivity enhancement was proceeding; evaluators recorded scores at the end of each session, and those scores were promptly recorded in a log. Finding 23. The evaluators' scoresheets contain justifications for each score. Id. The magnitude of the effort involved in advising offerors of weaknesses in productivity enhancement pales in comparison with the size of the job of creating reports as to other weaknesses; there were fifty times as many other scores recorded, and offerors were told about all weaknesses in those areas. Findings 11, 19. Whether additional information would have proved distracting to offerors is a matter of speculation, and in any event, it is not an excuse for an agency not to follow its own ground rules for the conduct of a procurement. The Navy's last defense is that Centel and FCC were not prejudiced by a lack of discussions because all offerors were treated equally. The case cited in support of this proposition is Lincoln Services, Ltd. v. United States, 230 Ct. Cl. 416, 678 F.2d 157 (1982). Respondent's Posthearing Brief at 25. This case does not say what the Navy alleges, and it also addresses a situation different in many material regards from the one now before us. In Lincoln Services, the Court did not enunciate a general rule, but merely found that in the specific circumstances presented, the agency's actions did not prejudice the plaintiff. 230 Ct. Cl. at 429, 678 F.2d at 164. This result is consistent with Board decisions. See, e.g., Diversified Systems, 88-3 BCA at 106,172, 1988 BPD 154, at 11; Richard S. Carson & Associates, Inc., GSBCA 9411-P, 88-2 BCA 20,778, at 104,983, 1988 BPD 93, at 8. Further, in Lincoln Services, the agency was alleged to have violated the requirements of a non-mandatory manual, rather than provisions of a statute, regulation, or solicitation. Additionally (though of no particular relevance to the issue at hand) the case involved a far more stringent burden of proof than is used at boards of contract appeals (arbitrary and capricious, rather than preponderance of the evidence). Award decision The solicitation announced that this procurement would culminate in the award of a contract to the offeror whose proposal was found by the source selection authority (SSA) to provide the "greatest value" to the Government. Award would be made to a "lower-priced proposal with less technical excellence, or to a higher-priced proposal with greater technical excellence," depending on the SSA's judgment. Finding 28. As understood in this procurement, "technical excellence" encompassed technical and management presentations. Finding 12. These factors were to be evaluated in accordance with a prescribed structure: Technical will account for 75 percent of the score and management will account for 25 percent. Although price will not be mathematically scored, price is a significant factor in the evaluation. Technical is the single most important factor and management is the least important factor. Finding 29. When confronted with this sort of arrangement, we have consistently held that the SSA's decision, in weighing one factor against another, "is proper if it is reasonable and consistent with the established evaluation and award factors." Computer Sciences Corp., GSBCA 11497-P, 92-1 BCA 24,703, at 123,298, 1992 BPD 6, at 34 (citing Hughes Advanced Systems Co., GSBCA 9601-P, 89-1 BCA 21,276, at 107,329, 1988 BPD 253, at 49, and Systems & Computer Technology Corp., GSBCA 8817-P, 87-2 BCA 19,703, at 99,761, 1987 BPD 34, at 19); see also Sonicraft, Inc. v. Defense Information Systems Agency, GSBCA 11750-P, 1992 BPD 182, at 37 (May 15, 1992); CRC Systems, Inc., GSBCA 9475-P, 88-3 BCA 20,936, at 105,797, 1988 BPD 136, at 8. Virtually all of the cases in this line construe solicitations, and actions stemming from them, which involve tradeoffs based on far more general factor weights than the ones involved here. Intergraph's expert witness mentioned three procurements as being like this one. In two of them -- Joint Staff Automation for the Nineties (JSAN) and Treasury Multi-User Acquisition Contract (TMAC) -- the solicitation said that the technical factor would be more important than cost, and in the third -- Reserve Component Automation System (RCAS) -- a combination of factors (including technical) was more important than cost. Transcript at 1062-65; Lockheed Missiles & Space Co. v. Department of the Treasury, GSBCA 11776-P et al., 1992 BPD 155, at 3 (June 2, 1992); Grumman Data Systems Corp. v. Department of the Air Force, GSBCA 11635-P, 92-2 BCA 24,999, at 124,595, 1992 BPD 100, at 8; Computer Sciences, 92-1 BCA at 123,282, 1992 BPD 6, at 3. Here, on the other hand, the Navy obligated itself to assign a specific degree of importance to price that would bear a specific relationship, over a range of values, to the technical and management factors. Consequently, it was especially important in this case for the agency to make a reasoned, documented analysis of which offeror actually offered the greatest value to the Government. The Navy believed, on the basis of its scoring, that Intergraph's technical presentation and management proposal were superior to Centel's and FCC's. Findings 30, 31. It also believed that the cost which would result from acceptance of either protester's proposal would be less than the cost from acceptance of Intergraph's offer. Finding 32. The agency did not make any sort of analysis which balanced each factor against the others, however. Finding 40. Instead, it ignored the management factor (considering that all proposals were approximately equal in that regard), Finding 39; paid lip service to the calculated scores for technical presentations, Finding 33; and made a tradeoff which involves only two considerations, "productivity enhancement grades" and cost, Findings 34-36. This was at variance from the scheme established in the solicitation; it was therefore improper. 10 U.S.C.A. 2305(b)(4)(B) (West Supp. 1992); FAR 15.608(a). Furthermore, the way in which the analysis was performed essentially eliminated cost as a factor, too, so that none of the factors that were supposed to be balanced played any real part in the award decision. This constituted a violation of FAR 15.605(b) ("price or cost to the Government shall be included as an evaluation factor in every source selection") as well. The resulting decision is left without any support. The management factor was eliminated from consideration by an explicit choice. Finding 39. The technical factor was eliminated by the agency's determination to distinguish the values of proposals by dividing cost differences by one percent of the estimated twelve-year labor cost of using the CAD systems being acquired, and then comparing the result to the cardinal difference in "productivity enhancement grades." See Finding 36. Nowhere in this calculus does the technical score appear. Indeed, whether any one offeror had scored twice as high as another -- or alternatively, a mere one point higher than its competitors -- would have made absolutely no difference in the analysis. The cost factor was eliminated from consideration by the nature of the calculations performed. The "grades" (the merits of the derivation of which we do not comment upon) ranged from ____ percent to ____ percent. Using what might be called "Navy math," the agency calculated the difference between these numbers as ____ percent. (Actually, as explained above, the second number is __ percent greater than the first.) For the lower- priced offer to be of greater value than the higher-priced, under the Navy's analysis, the price difference divided by $_________ would have to be more than ____. Finding 36. The result is $___._ million. The evaluated costs proposed in this procurement were in a range from $___ million to $___ million. Finding 32. Thus, even if the offeror with the lowest productivity enhancement grade had virtually offered to give away its system to the Navy, gratis, under the agency's analysis, the other firm's proposal would have had greater value. This is not the sort of rational, documented assessment necessary to justify an award to a higher-priced offeror. Grumman, 92-2 BCA at 124,605, 1992 BPD 100, at 38; Oakcreek Funding Corp., GSBCA 11244-P, 91-3 BCA 24,200, at 121,041, 1991 BPD 156, at 9. The Navy and Intergraph assert that the agency's greatest value analysis was tenable because it was based on an appraisal of the effect that acceptance of each proposal would have on the productivity of the people who would be using the systems. These parties remind us that we have found such appraisals to be rational justifications for greatest value awards. Lockheed, 1992 BPD 155, at 5, 21 (June 2, 1992); Computer Sciences, 92-1 BCA at 123,298, 1992 BPD 6, at 34-35. The problem for the Navy and Intergraph is that although the agency in this case made what it called a "productivity analysis," that analysis was not of productivity itself, but rather of grades that were based on scores for productivity enhancement features demonstrated at LTDs. We have been given no reason to believe that there is any necessary correlation between true productivity savings and these grades. The Navy itself recognized in performing its analysis that the grades are "not intended to be an absolute measure of productivity, but rather a relative measure." Finding 37; see also Finding 38. Agency officials acknowledged that in measuring productivity enhancement features, they were not measuring productivity. Finding 42. In reality, as explained by Centel's expert witness, the number and quality of these features -- or tools, as they are better described -- are only two of the factors involved in assessing productivity associated with a computer system. Finding 43. Some of these tools -- like the one cited by Intergraph's proposal manager, the "place text" command -- are not associated with productivity so much as with diversity of use. Finding 45. Many other tools are helpful to productivity only under certain circumstances. For example, as Centel's expert testified, a mouse or other pointing device -- even if it is a truly superlative pointing device -- is of no particular use in creating text; a keyboard is much better suited for that activity. Finding 46. The Navy made no effort to distinguish which tools actually help the users do their jobs faster and better. Finding 44. Nor did it attempt to distinguish which tools were more useful for novices than expert users of a system; and ultimately, it valued features especially highly because they would enhance the work of novices, though it expected most users of the system to be highly experienced. Findings 47, 48. Prejudice Intergraph has made a strenuous effort to persuade us that because its proposal was "head and shoulders" better than those of its two competitors, even if we find that the Navy has violated laws in the conduct of the procurement, we should allow the award to stand. FCC offers a diametrically opposed view: "What the final proposal scoring would have been, what the offered prices would have been, and which offeror would have been selected for award, had the Navy complied with [the solicitation provision mandating discussions of weaknesses], is indeterminate." FCC's Posthearing Brief at 3-4. In light of our holding that the underpinnings of the source selection decision are fatally defective, the decision cannot be permitted to stand. Were that all that was wrong with the Navy's conduct of the procurement, however, we might simply direct the agency to see again whether Intergraph's apparent technical superiority is worth its proposal's higher cost. Grumman, 92-2 BCA at 124,609, 1992 BPD 100, at 47; International Business Machines Corp., GSBCA 11359-P et al., 1991 BPD 227 (Sept. 25, 1991), reconsideration denied, 92-1 BCA 24,438, 1991 BPD 273. An appraisal of prejudice as to the other errors committed by the Navy is appropriate to see whether, under any scenario possible assuming that the errors had not been made, an award to Intergraph could be justified. See, e.g., Diversified Systems, 88-3 BCA at 106,172, 1988 BPD 154, at 11; Carson, 88-2 BCA at 104,983, 1988 BPD 93, at 8. Of course, as FCC says, these results are necessarily indeterminate. To ensure that no violations taint the hypothesized outcome, however, we paint the picture in the way most beneficial to the protesters. Thus, insofar as scoring is involved, we adjust the protesters' scores upward to the maximum conceivable under the circumstances. Corporate Jets, 91-2 BCA at 120,119, 1991 BPD 111, at 19-20; Carson, 88-2 BCA at 104,983, 1988 BPD 93, at 8. With regard to the errors in assigning weights to the various elements of the LTD, a rescoring is relatively simple. We have recalculated the scores in accordance with the way in which they should have been given importance, assuming that five percent is a reasonable proxy for "slightly more significant" in discriminating between the weights of the production and technical features tests. Finding 14(c). We emphasize that in making this recalculation, we have changed none of the scores assigned by Navy evaluators. Finding 14. This action alone closes the total technical score gap between Intergraph and each of the protesters by about _____ percent. Finding 14(d). With regard to the error of not discussing with offerors weaknesses in LTD productivity enhancement scores, we have taken testimony that if the protesters had had the benefit of discussions, they could have taken actions to improve their systems and the demonstration of those systems. Finding 27. While we recognize that the testimony is speculative and self- serving, we find that it is nonetheless credible. The extent to which it is true is of course uncertain. Still, accepting it to some extent is preferable to accepting Intergraph's contention that Centel and FCC could not have improved their scores at all because they were already trying as hard as they could, so by definition could not try harder. Intergraph's Posthearing Brief at 12. Under Intergraph's standard, discussions would never serve any purpose because offerors would not be able to improve their initial proposals. We have recalculated the LTD scores on the assumption that each protester was made aware of each instance in which it received a low score for a subfactor, and then achieved a score of three for that subfactor on retest. Our assumption as to rescoring is highly unlikely to be borne out in reality; that every new mark would be ideal is doubtful, and that Intergraph could not improve any of its scores after learning about weaknesses is also questionable. We have made the assumption, however, only to find the outside limit of any advantage the protesters could achieve from elimination of the agency's failure to observe its own rules for conducting the LTD. The result is that each protester's total technical score is higher than Intergraph's. Finding 26. In our recalculations, we have not adjusted the scores given by the Navy for either written technical proposals or management proposals. Neither protester alleged any violations of law which pertain to the technical proposals. Both alleged violations in the conduct of discussions regarding management proposals, but neither has persuaded us that the allegations are true. No evidence was presented on this matter at hearing. In the posthearing brief, FCC notes that the SSEB found weaknesses in its management proposal, but does not point to any evidence that no discussions were had as to these weaknesses. FCC's Posthearing Brief at 11. Centel also notes, in its brief, that the SSEB found weaknesses in its management proposal. Centel additionally asserts that only fourteen of the discrepancy reports given it (see Finding 19) pertained to the management proposal; that these do not pertain to the weaknesses identified in the SSEB report; and that the problems identified in the SSEB report were in relation to solicitation requirements, rather than competitors' proposals. Centel's Posthearing Brief at 17-18. In response, the Navy says that in the areas cited by Centel, the offeror earned points and therefore was not entitled to discussions. The agency apparently agrees that Centel's management proposal did contain weaknesses about which discussions were not held, but maintains that these weaknesses were in comparison to the proposals submitted by other offerors. Navy's Posthearing Brief at 26. We have insufficient evidence on the basis of which we might resolve the dispute. We therefore find that the protesters have failed to meet their burden of proving that the agency's failure to engage in discussions about weaknesses in management proposals was a violation of law. We have also made no adjustments in pricing proposals. We recognize that if the protesters had been told about weaknesses in their productivity enhancement scores, they might have lowered their prices to compensate for these problems. See Finding 27. Lowering prices would not be an appropriate response, however, if all the low scores were eliminated. Our rescoring of the LTDs assumes the maximization of scores for the subfactors in question. Considering lowered prices would double count the effect of the failure to engage in discussions about the weaknesses. Armed with our new technical scores, we have a new set of scores for use in making the tradeoff mandated by the solicitation: Evaluation factor Centel FCC Intergraph Technical ___ ___ ___ Management ___ ___ ___ "Technical excellence" ____ ____ ____ Price is now to be given a weight less than technical (for which the maximum possible is 937.5) and more than management (maximum, 312.5). Findings 13, 29, 31. How can price be compared to the other two factors? The cases say that in making this sort of tradeoff, price need not be assigned a numerical weight, and the solicitation says that in this procurement, "price will not be mathematically scored." Sonicraft, 1992 BPD 182, at 35; Finding 29. In assessing prejudice, however, we do not have the requisite knowledge for making a judgment based on anything other than a mathematical analysis. We therefore use such an examination for the purpose of gaining a basis from which a decisionmaker might begin in contemplating the tradeoff. We proceed in accordance with the SSAC chairperson's view that even if offerors' prices are not scored against a preexisting scale, they may be compared in some other sort of mathematical way. See Finding 40. To get an idea of total scores if price is given the least possible weight, we assign 313 points (just higher than the 312.5 available for the management factor) to the offeror with the lowest present value, life cycle cost. We then assign a percentage of that score to each of the other offerors, based on the fractional relationship between the lowest evaluated cost and that firm's figure. The result is that Intergraph has only ____ points -- much less than ______'s ____ (though more than ______'s ____). These point scores need not be strictly adhered to, but they are a useful guide to the SSA in deciding which proposal merits award. Sonicraft, 1992 BPD 182, at 37 (citing Bendix Field Engineering Corp., B-241156, 91-1 CPD 44, at 5 (Jan. 16, 1991), and RMY v. United States, 693 F. Supp. 1232, 1245 (D.D.C. 1988)). Given the disparity of ______________________ between Intergraph's score and the leader's, however, as well as the fact that this comparison assumes the significance of price which is most favorable to Intergraph, we cannot conclude that in spite of the Navy's errors, the contract should have been awarded to Intergraph. In this regard, we note that under a different rescoring of proposals, which assumes that Intergraph as well as Centel and FCC is able to convert all low LTD productivity enhancement scores to the maximum possible, Intergraph still finishes ______ in overall score even if price is given the least permissible weight. See Findings 14, 26 n.3, 31. Of course, as price is given increasing significance -- and the solicitation permits it to be considered as almost equal in importance to technical -- Intergraph falls farther behind, no matter which assumptions are used. In assessing the reasonability of the Navy's making any award against existing proposals, we also need to be concerned with possible prejudice to Intergraph from the failure to engage in discussions about weaknesses in productivity enhancement features. For this purpose, we have made another calculation to see whether award might be made, without further competition, to either Centel or FCC. This time we assume that Intergraph is able to benefit fully from the discussions, but neither of the other offerors is able to improve its scores at all. The result is that "technical excellence" scores are as follows: Centel, ___; FCC, ___; and Intergraph, ____. Findings 14(c), 26 n.3, 31. If price is given the maximum possible weight (937 points, just less than technical) and price proposals are compared by using the ratios described above, _______ achieves the highest total, ____; Intergraph has ____. (______ has only ____.) Thus, if the Navy were to decide that it values price virtually as much as technical, it could conceivably justify an award to ______. We do not place ourselves in the SSA's shoes; we do not determine that price should be given such a great weight or that such an award is justified. We do not require an award to ______, or even intimate that it should be made. We hold only that an award to ______ without further discussions could under certain circumstances be acceptable, and we leave to the SSA the determination of whether it is appropriate. We are cognizant, in reaching our decision as to the appropriate form of relief to be ordered, of the fact that the Navy has spent millions of dollars in conducting this procurement, including the better part of a million in running the three LTDs. Transcript at 870-71. Obviously, if the procurement is to continue, even more funds will be devoted to the project. The expenditure of a large sum of money is not a license to violate procurement laws, however. Each offeror has also spent a considerable amount of money in seeking the contract. See, e.g., Transcript at 1039. If companies cannot have confidence, before embarking on such an extensive competition, that the Government will live by its own rules, how many firms will want to do business with the Government? And then how much will the few firms that do choose to offer goods and services be able to charge the taxpayers? This decision is consistent with the statutory command that in deciding protests, we "accord due weight to the policies of [the Brooks Act] and the goals of economic and efficient procurement set forth in [it]." 40 U.S.C. 759(f)(5)(A) (1988). Decision The protests are GRANTED. The Navy shall terminate for the convenience of the Government the contract it awarded to Intergraph. The agency may then continue the procurement, acting in accordance with the dictates of statutes and regulations. If the SSA determines that price should be weighed at the upper extreme of the bounds permissible under the solicitation, and additionally concludes (based on a reasoned analysis) that the proposal submitted by ______ is most advantageous to the Government, he may award the contract to that offeror. If he comes to any other conclusion, he may not award a contract to any offeror without first, at a minimum, (a) having the contracting officer advise each offeror of weaknesses in its productivity enhancement features; (b) permitting each of the three firms to submit another best and final offer and (c) having agency evaluators evaluate those proposals and make sound recommendations regarding award. _________________________ STEPHEN M. DANIELS Board Judge We concur: _________________________ _________________________ VINCENT A. LaBELLA ROBERT W. PARKER Acting Chief Board Judge Board Judge