Integrating AI into Government Agencies

Overview

MetroStar launched its partnership in the Joint AI Test Infrastructure Capability (JATIC) program, a pivotal initiative aimed at bridging the gap between artificial intelligence (AI) potential and its practical application within the Department of Defense (DoD). This program significantly enhances the safety, efficiency, and effectiveness of military operations, providing a vital platform for the testing of AI technologies across the DoD, the Chief Digital and Artificial Intelligence Office (CDAO), and the AI Assurance Directorate.

Our engagement in JATIC is underscored by our early investment in AI through our Innovation Lab. This foresight enabled us to be at the forefront of AI integration within defense mechanisms. Our proprietary data experimentation platform, Onyx, supports this initiative. Developed on entirely free and open-source technologies, Onyx allows users to retain full data rights, offering unmatched flexibility and cost-effectiveness.

“By bridging the gap between AI potential and practical utilization, we’ve played a pivotal role in advancing national security objectives and future-proofing defense capabilities in an ever-evolving threat landscape.”

Overview

MetroStar launched its partnership in the Joint AI Test Infrastructure Capability (JATIC) program, a pivotal initiative aimed at bridging the gap between artificial intelligence (AI) potential and its practical application within the Department of Defense (DoD). This program significantly enhances the safety, efficiency, and effectiveness of military operations, providing a vital platform for the testing of AI technologies across the DoD, the Chief Digital and Artificial Intelligence Office (CDAO), and the AI Assurance Directorate.

Our engagement in JATIC is underscored by our early investment in AI through our Innovation Lab. This foresight enabled us to be at the forefront of AI integration within defense mechanisms. Our proprietary data experimentation platform, Onyx, supports this initiative. Developed on entirely free and open-source technologies, Onyx allows users to retain full data rights, offering unmatched flexibility and cost-effectiveness.

“By bridging the gap between AI potential and practical utilization, we’ve played a pivotal role in advancing national security objectives and future-proofing defense capabilities in an ever-evolving threat landscape.”

Background

Aligned with the DoD’s strategic imperative to incorporate AI technologies into operational domains, MetroStar’s efforts were rooted in strategic documents and annual goals outlined by the DoD.

The significance of this alignment was underscored by the DoD’s investment of $14.7 billion in AI R&D, reflecting a commitment to future-proofing defense capabilities. Moreover, the mission impact of the work was profound, as it directly contributed to ensuring AI-enabled systems (AIES) underwent rigorous testing, evaluation, and validation before deployment.

By partnering with the Test and Evaluation (T&E) community, JATIC facilitated the development of best practices for T&E of AI, thereby mitigating mission risks associated with AI system errors or unintended use. This collaborative effort propelled the DoD toward harnessing the significant potential of AI technologies while ensuring operational reliability and trustworthiness.

The mission of the DoD revolves around utilizing critical technologies such as AI to meet evolving mission needs and to stay ahead of near-peer adversaries. The emphasis on “trusted AI” by the Undersecretary for Research and Engineering highlights the significance of ensuring reliable AI-enabled systems. Through research and collaboration with the Test and Evaluation (T&E) community, JATIC aimed to define best practices for T&E of AI, thereby mitigating mission risks associated with AI-enabled systems. By enhancing the testing, evaluation, and trustworthiness of AI-enabled systems, the work of JATIC directly impacts the mission of the DoD, enabling the reliable deployment of AI technologies across critical defense applications.

Background

Aligned with the DoD’s strategic imperative to incorporate AI technologies into operational domains, MetroStar’s efforts were rooted in strategic documents and annual goals outlined by the DoD.

The significance of this alignment was underscored by the DoD’s investment of $14.7 billion in AI R&D, reflecting a commitment to future-proofing defense capabilities. Moreover, the mission impact of the work was profound, as it directly contributed to ensuring AI-enabled systems (AIES) underwent rigorous testing, evaluation, and validation before deployment.

By partnering with the Test and Evaluation (T&E) community, JATIC facilitated the development of best practices for T&E of AI, thereby mitigating mission risks associated with AI system errors or unintended use. This collaborative effort propelled the DoD toward harnessing the significant potential of AI technologies while ensuring operational reliability and trustworthiness.

The mission of the DoD revolves around utilizing critical technologies such as AI to meet evolving mission needs and to stay ahead of near-peer adversaries. The emphasis on “trusted AI” by the Undersecretary for Research and Engineering highlights the significance of ensuring reliable AI-enabled systems. Through research and collaboration with the Test and Evaluation (T&E) community, JATIC aimed to define best practices for T&E of AI, thereby mitigating mission risks associated with AI-enabled systems. By enhancing the testing, evaluation, and trustworthiness of AI-enabled systems, the work of JATIC directly impacts the mission of the DoD, enabling the reliable deployment of AI technologies across critical defense applications.

JATIC Product Focus Areas

Model Performance

Metrics to Assess Model Performance on Labeled Datasets

  • Accuracy, Precision, Recall, F1, mAP
  • Probability Calibration Metrics
  • Risk-Based Metrics or Metric Alterations
  • Bias & Fairness Metrics
  • Performance Metrics Across Subclasses

Natural Robustness

Generate Realistic Image Corruptions In-Silico to Assess Model Robustness

  • Environmental Effects (Rain, Snow)
  • Sensor Corruptions (Blur, Focus)
  • Digital Corruptions(Gaussian, Noise, Compression)
  • User-Directed Shifts (See: “Dataset Interfaces” / Madry)
  • Synthetic Environment

Dataset Analysis

Techniques to Manipulate, Understand, & Validate Datasets

  • Dataset Splitting (Train/Test/Val)
  • Dataset Validation Checks
  • Metrics for Dataset Drift, Similarity, & Comparison
  • Metrics for Dataset Complexity (Irreducible Error)
  • Anomaly & Outlier Detection
  • Dataset Bias Detection
  • Label Error Detection
  • Data Sufficiency

Model Analysis

Techniques to Better Understand Model Behaviors & Limitations

  • XAI Saliency Maps
  • Clustering of Model Errors
  • Quantitative Metrics for Saliency Maps
  • Competency
  • Concept Probing, Deep Inversions

Adversarial Robustness

Assess Model Robustness to Adversarial Attack

  • Patch Attacks
  • Physically Realizable Attacks
  • Attack Metrics
  • Auto Attack (And Similar)
  • White Box Attacks
  • Effects of Attacks / Defenses on Model Performance (Trade-Offs)
  • Black Box Attacks
  • Data Poisoning
  • Attack Cards

Model Cards, Data Cards, and T&E Reports

Generate Useful Artifacts from Model & Data T&E for Traceability & Reporting

  • Ingest Various T&E Metrics
  • Compatible with MLOps Tools
  • Machine Readable Model + Data Card
  • Graph / Viz Generation
  • PPTX Report Generation
  • Reports / Results Merged into Reporting Requirements
  • Policy Compliance Map, etc?

AI T&E Platform

Orchestrated Set of MLOps Tools, Tailored to Jumpstart AI T&E Speed & Maturity

  • MLFlow
  • Experiment Tracking
  • Model Registry
  • Viz Dashboard
  • Example Workbooks & Workflows
  • JupyterLab / IDE
  • Workflow Orchestrator
  • Database / Object Store
  • Result Introspection

Target Roles to Support within the ML Lifecycle

AI T&E Engineer

  • Measure Performance for Various Setting/Contexts
  • Verification of Requirement Satisfaction
  • Validation of Evaluation Results
  • Quantitative Evaluation of Risks
  • Explore Potential Unknown Risks to Identify New Risks
  • Develop Summary Report of Findings
  • Compare Models

ML Developer

  • Exploratory Data Analysis
  • Statistical Analysis
  • Data Visualization & Analysis
  • Model Experimentation
  • Feature Engineering
  • Model Training & Continuous Validation
  • Uses MLOps
  • Uses End-to-End Model Pipelines
  • Shifted-Left Risk Evaluation

Data Analyst

  • Data Visualization & Analysis
  • Statistical Analysis
  • Data Validation
  • Data Prep & Cleaning

AI Red Teamer

  • Perform Adversarial Assessments of AI Models
  • Model Building with Proxy Data
  • Threat Analysis Tree
  • Find (Non-Adversarial) Failure Modes
  • Often Have Limited Model Information
  • Generation of Attack Cards
  • Lessons Learned Fed to Developer Team

Other Roles

  • MLOps Engineer
  • Certifier/Acquisition Decision
  • Leadership
  • Model End-User
  • Software Engineer
  • Risk Assessment
  • Subject Matter Experts
  • Security
  • AO
  • Mission Owner
  • Troubleshooters & Operators

Key Objectives

Deliverables & Value Proposition

Measures of Success and How We are Enabling AI in Government

0
Apps Graduated from DARPA & Closed Projects to Fully Open-Source in Less than One Year
0
New or Improved Open-Source Packages Serving the Entire Data Science Community
IL0
Phase 2 will Deliver IL5 Accreditation to Meet the Need of a Federated MLOps & AI T&E Platform
0+
Active Users & Testers Internal to Government Prototype Playground

DEVELOP

Develop the T&E Platform

  • Data Hub: Database / Object Store / Data Versioning
  • Model Hub: AI/ML Model Registry
  • Support T&E Tools
  • Workflow Orchestration Engine / Workflow & Experiment Tracking
  • IDE / Interactive Python Notebooks; Python Environment Management

SIMPLIFY

Simplify the Deployment of the Platform

  • Automatic Infrastructure Provisioning
  • Single Interface Environment Management
  • Support Local Machine, On-Premise, and On-Cloud
  • Multi-GPU & Resource Management

ENABLE

Enable T&E Engineer to Use Platform and Libraries with Ease

  • Configure and Run Tests
  • Visualization / Dashboards
  • IDE / Interactive Python Notebooks
  • Report Generation

Deliverables & Value Proposition

Measures of Success and How We are Enabling AI in Government

0
Apps Graduated from DARPA & Closed Projects to Fully Open-Source in Less than One Year
0
New or Improved Open-Source Packages Serving the Entire Data Science Community
IL0
Phase 2 will Deliver IL5 Accreditation to Meet the Need of a Federated MLOps & AI T&E Platform
0+
Active Users & Testers Internal to Government Prototype Playground

DEVELOP

Develop the T&E Platform

  • Data Hub: Database / Object Store / Data Versioning
  • Model Hub: AI/ML Model Registry
  • Support T&E Tools
  • Workflow Orchestration Engine / Workflow & Experiment Tracking
  • IDE / Interactive Python Notebooks; Python Environment Management

SIMPLIFY

Simplify the Deployment of the Platform

  • Automatic Infrastructure Provisioning
  • Single Interface Environment Management
  • Support Local Machine, On-Premise, and On-Cloud
  • Multi-GPU & Resource Management

ENABLE

Enable T&E Engineer to Use Platform and Libraries with Ease

  • Configure and Run Tests
  • Visualization / Dashboards
  • IDE / Interactive Python Notebooks
  • Report Generation

Implementation Strategy

Strategic
Partnerships

Formed alliances with key DoD programs to leverage expertise and ensure alignment with operational objectives.

Leveraging
AI Expertise

Engaged with Naval Information Warfare Centers to solicit feedback and integrate AI capabilities effectively.

User-Driven
Development Cycles

Prioritized user needs and feedback to iterate and refine AI products iteratively, ensuring alignment with operational requirements.

Continuous
Improvement

Planned subsequent increments to tackle evolving challenges and incorporate emerging technologies over time.

Impacts + Achievements

There was a lack of a cohesive framework for the T&E of AIES, making it difficult to assess the readiness and reliability of these systems for DoD applications. There were no users or demand for JATIC products, with many products not easily available or interoperable.

With SEPTAR, there is now a structured approach to T&E for AIES, potentially reducing the time and resources required to validate AI systems for operational use. There’s a noted increase in user demand for JATIC, with available, interoperable products and improved documentation, indicating a shift towards greater maturity and usability.

The DoD’s adoption of AI technologies was hampered by uncertainties and the lack of standardized testing and evaluation (T&E) processes for AI systems, posing risks to mission-critical operations. The DoD faced challenges in adopting AI technologies due to a lack of centralized expertise, knowledge, and investment, along with uncertainties regarding novel risks.

The SEPTAR initiative has led to the development of best practices and alignment with DoD processes for T&E of AI, enhancing trustworthiness and reliability of AIES in defense operations. The program has led to a growing interest in JATIC products among major DoD AI programs, with efforts underway to enhance user adoption and feedback.

accelerate your mission with meaningful and targeted digital transformations

Learn more about how MetroStar’s playbooks, accelerators, and industry experts can help move your organization’s mission forward.

work with us
work with us