Integrating AI into Government Agencies

Overview

MetroStar launched its partnership in the Joint AI Test Infrastructure Capability (JATIC) program, a pivotal initiative aimed at bridging the gap between artificial intelligence (AI) potential and its practical application within the Department of Defense (DoD). This program significantly enhances the safety, efficiency, and effectiveness of military operations, providing a vital platform for the testing of AI technologies across the DoD, the Chief Digital and Artificial Intelligence Office (CDAO), and the AI Assurance Directorate.

Our engagement in JATIC is underscored by our early investment in AI through our Innovation Lab. This foresight enabled us to be at the forefront of AI integration within defense mechanisms. Our proprietary data experimentation platform, Onyx, supports this initiative. Developed on entirely free and open-source technologies, Onyx allows users to retain full data rights, offering unmatched flexibility and cost-effectiveness.

“By bridging the gap between AI potential and practical utilization, we’ve played a pivotal role in advancing national security objectives and future-proofing defense capabilities in an ever-evolving threat landscape.”

Overview

“By bridging the gap between AI potential and practical utilization, we’ve played a pivotal role in advancing national security objectives and future-proofing defense capabilities in an ever-evolving threat landscape.”

Background

Aligned with the DoD’s strategic imperative to incorporate AI technologies into operational domains, MetroStar’s efforts were rooted in strategic documents and annual goals outlined by the DoD.

The significance of this alignment was underscored by the DoD’s investment of $14.7 billion in AI R&D, reflecting a commitment to future-proofing defense capabilities. Moreover, the mission impact of the work was profound, as it directly contributed to ensuring AI-enabled systems (AIES) underwent rigorous testing, evaluation, and validation before deployment.

By partnering with the Test and Evaluation (T&E) community, JATIC facilitated the development of best practices for T&E of AI, thereby mitigating mission risks associated with AI system errors or unintended use. This collaborative effort propelled the DoD toward harnessing the significant potential of AI technologies while ensuring operational reliability and trustworthiness.

The mission of the DoD revolves around utilizing critical technologies such as AI to meet evolving mission needs and to stay ahead of near-peer adversaries. The emphasis on “trusted AI” by the Undersecretary for Research and Engineering highlights the significance of ensuring reliable AI-enabled systems. Through research and collaboration with the Test and Evaluation (T&E) community, JATIC aimed to define best practices for T&E of AI, thereby mitigating mission risks associated with AI-enabled systems. By enhancing the testing, evaluation, and trustworthiness of AI-enabled systems, the work of JATIC directly impacts the mission of the DoD, enabling the reliable deployment of AI technologies across critical defense applications.

Background

Aligned with the DoD’s strategic imperative to incorporate AI technologies into operational domains, MetroStar’s efforts were rooted in strategic documents and annual goals outlined by the DoD.

JATIC Product Focus Areas

Model Performance

Metrics to Assess Model Performance on Labeled Datasets

Currently Planned:

Accuracy, Precision, Recall, F1, mAP
Probability Calibration Metrics
Risk-Based Metrics or Metric Alterations
Bias & Fairness Metrics
Performance Metrics Across Subclasses

Natural Robustness

Generate Realistic Image Corruptions In-Silico to Assess Model Robustness

Currently Planned:

Environmental Effects (Rain, Snow)
Sensor Corruptions (Blur, Focus)
Digital Corruptions(Gaussian, Noise, Compression)

Future Interest:

User-Directed Shifts (See: “Dataset Interfaces” / Madry)
Synthetic Environment

Dataset Analysis

Techniques to Manipulate, Understand, & Validate Datasets

Currently Planned:

Dataset Splitting (Train/Test/Val)
Dataset Validation Checks
Metrics for Dataset Drift, Similarity, & Comparison
Metrics for Dataset Complexity (Irreducible Error)
Anomaly & Outlier Detection
Dataset Bias Detection

Future Interest:

Label Error Detection
Data Sufficiency

Model Analysis

Techniques to Better Understand Model Behaviors & Limitations

Currently Planned:

XAI Saliency Maps
Clustering of Model Errors

Future Interest:

Quantitative Metrics for Saliency Maps
Competency
Concept Probing, Deep Inversions

Adversarial Robustness

Assess Model Robustness to Adversarial Attack

Currently Planned:

Patch Attacks
Physically Realizable Attacks
Attack Metrics
Auto Attack (And Similar)
White Box Attacks
Effects of Attacks / Defenses on Model Performance (Trade-Offs)

Future Interest:

Black Box Attacks
Data Poisoning
Attack Cards

Model Cards, Data Cards, and T&E Reports

Generate Useful Artifacts from Model & Data T&E for Traceability & Reporting

Currently Planned:

Ingest Various T&E Metrics
Compatible with MLOps Tools
Machine Readable Model + Data Card
Graph / Viz Generation
PPTX Report Generation

Future Interest:

Reports / Results Merged into Reporting Requirements
Policy Compliance Map, etc?

AI T&E Platform

Orchestrated Set of MLOps Tools, Tailored to Jumpstart AI T&E Speed & Maturity

Currently Planned:

MLFlow
Experiment Tracking
Model Registry
Viz Dashboard
Example Workbooks & Workflows
JupyterLab / IDE
Workflow Orchestrator
Database / Object Store

Future Interest:

Result Introspection

Target Roles to Support within the ML Lifecycle

AI T&E Engineer

Responsibilities

Measure Performance for Various Setting/Contexts
Verification of Requirement Satisfaction
Validation of Evaluation Results
Quantitative Evaluation of Risks
Explore Potential Unknown Risks to Identify New Risks
Develop Summary Report of Findings
Compare Models

ML Developer

Responsibilities

Exploratory Data Analysis
Statistical Analysis
Data Visualization & Analysis
Model Experimentation
Feature Engineering
Model Training & Continuous Validation
Uses MLOps
Uses End-to-End Model Pipelines
Shifted-Left Risk Evaluation

Data Analyst

Responsibilities

Data Visualization & Analysis
Statistical Analysis
Data Validation
Data Prep & Cleaning

AI Red Teamer

Responsibilities

Perform Adversarial Assessments of AI Models
Model Building with Proxy Data
Threat Analysis Tree
Find (Non-Adversarial) Failure Modes
Often Have Limited Model Information
Generation of Attack Cards
Lessons Learned Fed to Developer Team

Other Roles

Additional Roles

MLOps Engineer
Certifier/Acquisition Decision
Leadership
Model End-User
Software Engineer
Risk Assessment
Subject Matter Experts
Security
AO
Mission Owner
Troubleshooters & Operators

Key Objectives

Addressing Adoption Hurdles

Overcoming challenges hindering the widespread adoption of AI technologies within the DoD.

Enhancing Operational Efficiency

Leveraging AI to streamline logistics, intelligence, operations, and healthcare functions.

Ensuring Reliability

Establishing robust testing and evaluation protocols to validate the performance and trustworthiness of AI-enabled systems.

Facilitating Strategic Partnerships

Collaborating with key DoD programs to integrate AI expertise and feedback into development efforts.

User-Driven Development

Prioritizing user needs and feedback to ensure AI products meet real-world operational requirements effectively.

Deliverables & Value Proposition

Measures of Success and How We are Enabling AI in Government

Apps Graduated from DARPA & Closed Projects to Fully Open-Source in Less than One Year

New or Improved Open-Source Packages Serving the Entire Data Science Community

IL0

Phase 2 will Deliver IL5 Accreditation to Meet the Need of a Federated MLOps & AI T&E Platform

Active Users & Testers Internal to Government Prototype Playground

DEVELOP

Develop the T&E Platform

Data Hub: Database / Object Store / Data Versioning
Model Hub: AI/ML Model Registry
Support T&E Tools
Workflow Orchestration Engine / Workflow & Experiment Tracking
IDE / Interactive Python Notebooks; Python Environment Management

SIMPLIFY

Simplify the Deployment of the Platform

Automatic Infrastructure Provisioning
Single Interface Environment Management
Support Local Machine, On-Premise, and On-Cloud
Multi-GPU & Resource Management

ENABLE

Enable T&E Engineer to Use Platform and Libraries with Ease

Configure and Run Tests
Visualization / Dashboards
IDE / Interactive Python Notebooks
Report Generation

Deliverables & Value Proposition

Measures of Success and How We are Enabling AI in Government

Apps Graduated from DARPA & Closed Projects to Fully Open-Source in Less than One Year

New or Improved Open-Source Packages Serving the Entire Data Science Community

IL0

Phase 2 will Deliver IL5 Accreditation to Meet the Need of a Federated MLOps & AI T&E Platform

Active Users & Testers Internal to Government Prototype Playground

DEVELOP

Develop the T&E Platform

Data Hub: Database / Object Store / Data Versioning
Model Hub: AI/ML Model Registry
Support T&E Tools
Workflow Orchestration Engine / Workflow & Experiment Tracking
IDE / Interactive Python Notebooks; Python Environment Management

SIMPLIFY

Simplify the Deployment of the Platform

Automatic Infrastructure Provisioning
Single Interface Environment Management
Support Local Machine, On-Premise, and On-Cloud
Multi-GPU & Resource Management

ENABLE

Enable T&E Engineer to Use Platform and Libraries with Ease

Configure and Run Tests
Visualization / Dashboards
IDE / Interactive Python Notebooks
Report Generation

Implementation Strategy

Strategic
Partnerships

Formed alliances with key DoD programs to leverage expertise and ensure alignment with operational objectives.

Leveraging
AI Expertise

Engaged with Naval Information Warfare Centers to solicit feedback and integrate AI capabilities effectively.

User-Driven
Development Cycles

Prioritized user needs and feedback to iterate and refine AI products iteratively, ensuring alignment with operational requirements.

Continuous
Improvement

Planned subsequent increments to tackle evolving challenges and incorporate emerging technologies over time.

BEFORE

There was a lack of a cohesive framework for the T&E of AIES, making it difficult to assess the readiness and reliability of these systems for DoD applications. There were no users or demand for JATIC products, with many products not easily available or interoperable.

AFTER

With SEPTAR, there is now a structured approach to T&E for AIES, potentially reducing the time and resources required to validate AI systems for operational use. There’s a noted increase in user demand for JATIC, with available, interoperable products and improved documentation, indicating a shift towards greater maturity and usability.

BEFORE
AFTER

BEFORE

The DoD’s adoption of AI technologies was hampered by uncertainties and the lack of standardized testing and evaluation (T&E) processes for AI systems, posing risks to mission-critical operations. The DoD faced challenges in adopting AI technologies due to a lack of centralized expertise, knowledge, and investment, along with uncertainties regarding novel risks.

AFTER

The SEPTAR initiative has led to the development of best practices and alignment with DoD processes for T&E of AI, enhancing trustworthiness and reliability of AIES in defense operations. The program has led to a growing interest in JATIC products among major DoD AI programs, with efforts underway to enhance user adoption and feedback.

accelerate your mission with meaningful and targeted digital transformations

Learn more about how MetroStar’s playbooks, accelerators, and industry experts can help move your organization’s mission forward.

work with us

Integrating AI into Government Agencies

Overview

“By bridging the gap between AI potential and practical utilization, we’ve played a pivotal role in advancing national security objectives and future-proofing defense capabilities in an ever-evolving threat landscape.”

Overview

“By bridging the gap between AI potential and practical utilization, we’ve played a pivotal role in advancing national security objectives and future-proofing defense capabilities in an ever-evolving threat landscape.”

Background

Background

JATIC Product Focus Areas

Model Performance

Natural Robustness

Dataset Analysis

Model Analysis

Adversarial Robustness

Model Cards, Data Cards, and T&E Reports

AI T&E Platform

Target Roles to Support within the ML Lifecycle

AI T&E Engineer

ML Developer

Data Analyst

AI Red Teamer

Other Roles

Key Objectives

Addressing Adoption Hurdles

Enhancing Operational Efficiency

Ensuring Reliability

Facilitating Strategic Partnerships

User-Driven Development

Deliverables & Value Proposition

DEVELOP

Develop the T&E Platform

SIMPLIFY

Simplify the Deployment of the Platform

ENABLE

Enable T&E Engineer to Use Platform and Libraries with Ease

Deliverables & Value Proposition

DEVELOP

Develop the T&E Platform

SIMPLIFY

Simplify the Deployment of the Platform

ENABLE

Enable T&E Engineer to Use Platform and Libraries with Ease

Implementation Strategy

Strategic Partnerships

Leveraging AI Expertise

User-Driven Development Cycles

Continuous Improvement

Impacts + Achievements

BEFORE

AFTER

BEFORE

AFTER

BEFORE

AFTER

BEFORE

AFTER

accelerate your mission with meaningful and targeted digital transformations

Strategic
Partnerships

Leveraging
AI Expertise

User-Driven
Development Cycles

Continuous
Improvement