|
Home
TRANSTAC
Publications
Contact
|
|
System, Component,
and Operationally-Relevant Evaluations (SCORE)
SCORE
(System, Component and Operationally Relevant Evaluations) is a unified
set of criteria and software tools for defining a performance evaluation
approach for complex intelligent systems. It provides a comprehensive
evaluation blueprint that assesses the technical performance of a system
and its components through isolating and changing variables as well
as capturing end-user utility of the system in realistic use-case environments.
SCORE
is unique in that:
- It is applicable
to a wide range of technologies, from manufacturing to defense systems
- Elements of
SCORE can be decoupled and customized based upon evaluation goals
- It has the
ability to evaluate a technology at various stages of development,
from conceptual to full maturation
- It combines
the results of targeted evaluations to produce an extensive picture
of a systems capabilities and utility
Intelligent
systems tend to be complex and non-deterministic, involving numerous
components that are jointly working together to accomplish some overall
goal. Existing approaches to measuring such systems often focus on evaluating
the system as a whole or individually evaluating some of the individual
components under very controlled, but limited, conditions. These approaches
do not comprehensively and quantitatively assess the impact of variables
such as environmental variables (e.g, lighting, external distances)
and system variables (e.g., processing power, memory size) on the systems
overall performance. Through its comprehensive evaluation criteria and
software tools, the SCORE framework has greatly enhanced the ability
to quantitatively and qualitatively evaluate intelligent systems at
the component level -- and the system level -- in operationally relevant
environments.
Applications
SCORE was initially applied to intelligent systems developed under the
DARPA (Defense Advanced Research Projects Agency) ASSIST
and TRANSTAC programs. The SCORE-based
evaluations also provided the researchers and end users with the information
that they needed to determine if and when the technology will be ready
to be put to use. SCORE allowed developers to identify the various key
components of the system and evaluate them both independently and as
a whole, thus helping to determine the impact of the individual components
on the performance of the overall system. This detailed analysis allows
one to more accurately target the aspects of the systems that were shown
to provide the greatest benefit to the overall advancement of the technology
and therefore helped to identify where the program funding should be
applied to get the most bang for the buck.
Framework
Click on image to
download PowerPoint Show of SCORE Framework, however,
it's best when being viewed in either Firefox or Chrome.
Design
Elements
Factors the
must be considered when planning a technology evaluation that are driven
by system development status and program goals include:
|
|
Identification
of the system or component to be assessed |
|
|
Definition
of the goal(s), objective(s), metrics/measures |
|
|
Specification
of testing environment (system maturity & intended use, physical
environment factors, site suitability, and availability) |
|
|
Identification
of participants (system users and actors) |
|
|
Specification
of required participant training |
|
|
Specification
of data collection methods |
|
|
Specification
of the use-scenarios to challenge the evaluation system |
The SCORE framework has proven to be widely-applicable in nature and
equally relevant to technologies ranging from manufacturing to
military systems. It has been applied to the evaluation of technologies
in DARPA programs that range from soldier-worn
sensor technology that enhance battlefield awareness by soldiers on
patrol to speech-to-speech translation system
that automatically translates between English and Arabic spoken utterances.
It is also currently being applied to the assessing the control of autonomous
vehicles on a shop floor. The SCORE framework has been applied to eight
week-long evaluations (involving over 60 personnel at each evaluation)
assessing the performance of technologies developed by twelve independent
research teams under these two DARPA programs, yielding results that
at the level of detail described throughout this write-up. Additionally,
SCORE is being applied to the Virtual
Manufacturing Automation Competition (VMAC).
Demonstrated
Impact/Accomplishments Exceeds Performance Expectations
The impact of this work has been far-reaching and substantial. This
can be seen by:
|
|
The
SCORE framework has been adopted by multiple programs within DARPA,
which has greatly enhanced their ability to quantitatively and
qualitatively evaluate intelligent systems at multiple levels. |
|
|
The
approaches used in SCORE are starting to redefine the way that
performance evaluation is performed on intelligent systems. As
a result of the DARPA evaluations, the SCORE Evaluation Team has
been asked to advise other programs on how to apply the techniques
for their purposes. |
|
|
Research
teams are starting to use the SCORE evaluation approach to evaluate
their own systems. One researcher stated We switched
to NISTs evaluation procedures because we found them superior
to our own. |
|