Tech Tip #1: Moving Data from AVEVA PI Systems to Databricks Using Twin Talk

March 9, 2025

content admin
Articles

In the era of industrial digital transformation, leveraging operational data for AI-driven insights is critical for optimizing production and minimizing downtime. However, moving time-series data from traditional industrial systems like AVEVA PI Systems (PI Data Archive and Asset Framework) to cloud-based analytics platforms such as Databricks can be challenging. This article guides you through the process of seamlessly integrating AVEVA PI Systems with Databricks using Twin Talk from EOT.AI, enabling real-time, scalable analytics for industrial operations. By the end of this guide, you will understand:
How Twin Talk retrieves and transforms PI System data.
How to configure Databricks to receive industrial time-series data.
How to establish secure connections and automate data pipelines.
The benefits of using AI/ML in Databricks on industrial data.

Understanding the Integration Architecture

AVEVA PI Systems Overview

AVEVA PI Systems store real-time and historical industrial data, typically structured as:

PI Data Archive: Stores raw sensor data in time-series format.
Asset Framework (AF): Organizes data into a structured hierarchy for contextualized insights.

Twin Talk’s Role

Twin Talk acts as the middleware that connects AVEVA PI Systems with Databricks by:

Retrieving time-series data from PI Data Archive and AF.
Transforming and structuring the data into a format suited for cloud ingestion.
Loading the data into Databricks tables using the Databricks SQL API.

Why Databricks?

Databricks provides:

A scalable cloud-based data lakehouse architecture.
Built-in machine learning and real-time analytics.
Seamless integration with Apache Spark for large-scale industrial data processing.

Step-by-Step Guide: Moving Data to Databricks

Preparing Databricks for Twin Talk Ingestion

Before Twin Talk can send data, Databricks must be configured to receive it.

Step 1: Create a Databricks Service Principal

A Service Principal is required to authenticate Twin Talk with Databricks.

Log in to Databricks Account Console.
Navigate to User Management > Service Principals.
Click Add Service Principal, provide a name, and create it.
Under “OAuth secrets”, click Generate Secret, and securely store it.

Step 2: Assign Required Roles & Privileges

Grant SELECT and INSERT privileges to the Twin Talk Service Principal.

Assign roles within Unity Catalog for controlled access.
Enable audit logs to track data access and usage.

Step 3: Define Databricks Tables and Schemas

The structure of landing tables in Databricks should match the industrial data format:

CREATE TABLE realtime_historian (
hierarchy_path STRING,
measure_name STRING,
measure_value DOUBLE,
timestamp TIMESTAMP
);

Different AF Views can map to different Databricks tables based on use cases.

Configuring Twin Talk for Data Ingestion

Step 4: Connect Twin Talk to the AF Server

Update the TwinTalk.Config file with the AF Server details:

<add key=”PiSystemName” value=”<AF server IP or domain>” />
<add key=”AfDataBaseName” value=”<AF database name>” />
<add key=”AfUserName” value=”<AF service account username>” />
<add key=”AfPassword” value=”<encrypted password>” />

This enables Twin Talk to retrieve data from PI System in real-time.

Step 5: Configure Twin Talk Queries

Twin Talk uses AF Queries to extract data from the PI System:

$ParentTemplate:*Fan*

Queries are designed to select specific PI Points based on templates and metadata.
Data transformation can include pivoting, aggregation, and filtering.

Automating Data Transfer to Databricks

Step 6: Establish Secure API Communication

In the Twin Talk UI, set up API parameters for Databricks:

{
“User-Agent”: “EOT_TWIN_TALK”,
“X-Databricks-Authorization-Token-Type”: “KEYPAIR_JWT”,
“Content-Type”: “application/json”,
“tt_statement”: “INSERT INTO realtime_historian VALUES $VALUES”
}

This ensures secure, encrypted data transmission.
Twin Talk directly inserts data using Databricks SQL API.

Step 7: Execute the Data Pipeline

Once Twin Talk is connected and configured:

The Insert SQL Creator in Twin Talk helps generate custom SQL statements for Databricks.
The AF Query Timer ensures continuous, scheduled ingestion of real-time industrial data.

Step 8: Validate and Monitor Data Flow

In Databricks, verify that the data is being populated:

SELECT * FROM realtime_historian LIMIT 10;

Check audit logs to ensure successful ingestion:

SELECT * FROM system.access.audit WHERE user_agent LIKE ‘EOT_TWIN_TALK’;

Results: Unlocking the Power of Industrial Data in Databricks

Once your data is flowing into Databricks, you can:

Analyze Operational Trends
- Use SQL queries or Apache Spark for in-depth analysis.
- Detect anomalies in sensor data.
Apply AI/ML for Predictive Maintenance
- Train ML models to predict equipment failures.
- Optimize production schedules using Databricks AutoML.
Integrate with Business Intelligence Tools
- Use Power BI or Tableau for real-time dashboards.
- Correlate industrial data with business KPIs.

Conclusion

Moving time-series data from AVEVA PI Systems into Databricks using Twin Talk unlocks new possibilities for industrial analytics. By leveraging Databricks’ cloud-based AI and ML capabilities, you can optimize operations, reduce downtime, and gain real-time insights from your industrial data.

By following this guide, you have: Configured Databricks to receive industrial data. Set up Twin Talk for real-time data ingestion. Established a scalable data pipeline. Enabled AI-driven predictive analytics.

Now, you can start applying advanced data science and machine learning on your industrial data to drive operational excellence!

Ready to get started?
Implement your first data pipeline today and explore the power of AI!

Download a full break down of how to create data pipelines from the PI System to Databricks here:
Twin Talk Aveva – Databricks Data Pipelines

Recent Post