Tutorials / Data Pipelines

Data Pipelines

Stream device data to external services and cloud platforms

Overview

Data Pipelines enable seamless integration with external services and cloud platforms. Stream your IoT data in real-time to analytics tools, databases, data lakes, and third-party applications without writing integration code.

Supported Destinations

☁️

Cloud Services

AWS IoT, Azure IoT Hub, Google Cloud IoT, IBM Watson IoT

🗄️

Databases

PostgreSQL, MongoDB, InfluxDB, TimescaleDB, MySQL

📊

Analytics

Elasticsearch, Grafana, Tableau, Power BI, Google Analytics

🔗

Custom Webhooks

Any REST API endpoint with custom headers and authentication

Creating a Data Pipeline

1

Access Pipelines Section

Navigate to Data Pipelines from the main menu.

2

Create New Pipeline

Click the add button and provide a descriptive name for your pipeline.

3

Select Data Source

Choose which devices or device groups will feed data into this pipeline.

4

Configure Destination

Select destination type and provide connection credentials (API keys, connection strings, endpoints).

5

Map Data Fields

Define how device data fields map to destination schema. Apply transformations if needed.

6

Set Delivery Options

Configure batching, frequency, retry policies, and error handling.

7

Test and Activate

Send test data to verify configuration. Activate the pipeline to begin streaming.

Pipeline Configuration Options

OptionDescriptionUse Case
Batch SizeNumber of records per transmissionOptimize for destination API limits
FrequencyHow often data is sentReal-time vs. periodic batching
Data FilteringSend only matching conditionsReduce data volume and costs
TransformationModify data structure or valuesFormat conversion, unit changes
Retry PolicyAutomatic retry on failuresHandle temporary connectivity issues
Error HandlingWhat to do when delivery failsDead letter queue, alerts, skip

Example Pipelines

IoT Data to AWS S3 Data Lake

Source: All temperature sensors

Destination: AWS S3 bucket

Configuration:

  • Batch 1000 records or every 5 minutes
  • Partition by device ID and date
  • Convert to Parquet format for analytics
  • Enable server-side encryption

Real-time Analytics to Elasticsearch

Source: All devices

Destination: Elasticsearch cluster

Configuration:

  • Real-time streaming (no batching)
  • Index by device type and timestamp
  • Include metadata for filtering
  • Retry up to 3 times on failure

Sensor Data to Custom REST API

Source: Environmental monitoring devices

Destination: Custom webhook endpoint

Configuration:

  • POST JSON payload on every reading
  • Include Bearer token authentication
  • Custom headers for API versioning
  • Alert on consecutive failures

Data Transformation

Apply transformations to match destination requirements:

Supported Transformations

  • Field Mapping: Rename fields to match destination schema
  • Unit Conversion: Convert temperature, pressure, distance units
  • Data Type Casting: String to number, timestamp formatting
  • Aggregation: Calculate averages, min/max, sums over time windows
  • Enrichment: Add device metadata, location, custom tags
  • Filtering: Include/exclude specific fields or records
  • Custom Scripts: JavaScript transformation for complex logic

Monitoring and Troubleshooting

Pipeline Metrics

Monitor pipeline health and performance:

  • Records processed (success/failure counts)
  • Throughput (records per second/minute)
  • Delivery latency (time from source to destination)
  • Error rate and failure reasons
  • Queue depth (pending records)
  • Cost estimation (for cloud destinations)

Common Issues and Solutions

IssueCauseSolution
High failure rateInvalid credentialsVerify API keys and permissions
Slow deliverySmall batch sizeIncrease batch size or frequency
Duplicate dataRetry on transient errorsImplement idempotency in destination
Missing fieldsIncorrect mappingReview field mapping configuration

Security Best Practices

  • Use secure credential storage (never hardcode secrets)
  • Enable TLS/SSL for all connections
  • Rotate API keys and credentials regularly
  • Apply principle of least privilege (minimal permissions)
  • Monitor for unauthorized access attempts
  • Encrypt sensitive data before transmission
  • Use VPN or private connectivity for database destinations
Pro Tip: Start with a small subset of devices when testing new pipelines. Gradually scale up after confirming data quality and destination compatibility.