Azure Active Directory (Entra ID)
Azure Active Directory provides enterprise identity and access management for BoilStream. This guide covers integrating with Azure AD for JWT authentication and Microsoft 365 SSO.
Overview
Azure AD integration provides:
- Enterprise SSO: Seamless integration with Microsoft 365 and Azure
- Multi-tenant Support: Support users from multiple Azure AD tenants
- Group-based Authorization: Map Azure AD groups and roles to BoilStream permissions
- Conditional Access: Leverage Azure AD security policies
- High Availability: Microsoft-managed service with enterprise SLA
Prerequisites
- Azure AD tenant with appropriate permissions
- Azure CLI or Azure Portal access
- App registration in Azure AD
- Users and groups configured in Azure AD
Azure AD Setup
1. Create App Registration
Register BoilStream as an application in Azure AD:
# Create app registration
az ad app create \
--display-name "BoilStream Data Platform" \
--sign-in-audience "AzureADMyOrg" \
--web-redirect-uris "https://boilstream.company.com/auth/callback"
# Get the application ID
az ad app list --display-name "BoilStream Data Platform" --query "[0].appId"
{
"displayName": "BoilStream Data Platform",
"signInAudience": "AzureADMyOrg",
"web": {
"redirectUris": [
"https://boilstream.company.com/auth/callback"
]
},
"requiredResourceAccess": [
{
"resourceAppId": "00000003-0000-0000-c000-000000000000",
"resourceAccess": [
{
"id": "e1fe6dd8-ba31-4d61-89e7-88639da4683d",
"type": "Scope"
}
]
}
]
}
2. Configure API Permissions
Grant necessary permissions for group and user information:
# Add Microsoft Graph permissions
az ad app permission add \
--id "12345678-1234-1234-1234-123456789abc" \
--api "00000003-0000-0000-c000-000000000000" \
--api-permissions "e1fe6dd8-ba31-4d61-89e7-88639da4683d=Scope"
# Grant admin consent
az ad app permission admin-consent \
--id "12345678-1234-1234-1234-123456789abc"
Required Permissions:
User.Read
- Read user profileGroupMember.Read.All
- Read group memberships (optional, for groups in tokens)
3. Configure Token Settings
Enable group claims in ID tokens:
# Update app to include groups in tokens
az ad app update \
--id "12345678-1234-1234-1234-123456789abc" \
--optional-claims '{
"idToken": [
{
"name": "groups",
"essential": false
}
],
"accessToken": [
{
"name": "groups",
"essential": false
}
]
}'
4. Create Security Groups
Create groups for authorization mapping:
# Create admin group
az ad group create \
--display-name "BoilStream Admins" \
--mail-nickname "boilstream-admins" \
--description "BoilStream platform administrators"
# Create data producer group
az ad group create \
--display-name "Data Producers" \
--mail-nickname "data-producers" \
--description "Users who can write data to BoilStream"
# Create data analyst group
az ad group create \
--display-name "Data Analysts" \
--mail-nickname "data-analysts" \
--description "Users who can read data from BoilStream"
5. Add Users to Groups
# Get user and group object IDs
USER_ID=$(az ad user show --id "admin@company.com" --query "id" -o tsv)
GROUP_ID=$(az ad group show --group "BoilStream Admins" --query "id" -o tsv)
# Add user to group
az ad group member add --group "$GROUP_ID" --member-id "$USER_ID"
BoilStream Configuration
Environment Variables
Configure BoilStream for Azure AD authentication:
# Enable Azure AD authentication
export AUTH_PROVIDERS="azure-ad"
# Azure AD configuration (REQUIRED)
export AZURE_TENANT_ID="12345678-1234-1234-1234-123456789abc"
export AZURE_CLIENT_ID="87654321-4321-4321-4321-210987654321"
# Optional: Allow multi-tenant (default: false)
export AZURE_ALLOW_MULTI_TENANT="false"
# Authorization groups (use Azure AD group object IDs or display names)
export ADMIN_GROUPS="f1e2d3c4-b5a6-9780-1234-567890abcdef"
export WRITE_GROUPS="a1b2c3d4-e5f6-7890-1234-567890abcdef,data-producers"
export READ_ONLY_GROUPS="Data Analysts,business-users"
Multi-tenant Configuration
For multi-tenant scenarios (supporting users from multiple Azure AD tenants):
# Use "common" tenant for multi-tenant support
export AZURE_TENANT_ID="common"
export AZURE_ALLOW_MULTI_TENANT="true"
# Still use your specific client ID
export AZURE_CLIENT_ID="87654321-4321-4321-4321-210987654321"
Docker Compose Example
version: '3.8'
services:
boilstream:
image: boilstream:latest
environment:
# Azure AD Authentication
AUTH_PROVIDERS: "azure-ad"
AZURE_TENANT_ID: "12345678-1234-1234-1234-123456789abc"
AZURE_CLIENT_ID: "87654321-4321-4321-4321-210987654321"
AZURE_ALLOW_MULTI_TENANT: "false"
# Authorization (using group object IDs)
ADMIN_GROUPS: "f1e2d3c4-b5a6-9780-1234-567890abcdef"
WRITE_GROUPS: "a1b2c3d4-e5f6-7890-1234-567890abcdef"
READ_ONLY_GROUPS: "b2c3d4e5-f6a7-8901-2345-678901bcdefg"
# Other BoilStream config...
S3_BUCKET: "my-data-lake"
AWS_REGION: "us-east-1"
JWT Token Claims
BoilStream extracts the following claims from Azure AD JWT tokens:
Standard Claims
sub
- User object IDiss
- Issuer (Azure AD tenant)aud
- Audience (Client ID)exp
- Expiration timestampiat
- Issued at timestamptid
- Tenant ID
Azure AD-Specific Claims
groups
- Array of group object IDs (if configured)roles
- Array of application role assignmentsscp
- Space-separated OAuth scopesupn
- User Principal Nameunique_name
- Usernamename
- Display name
Example JWT Claims
{
"sub": "f1e2d3c4-b5a6-9780-1234-567890abcdef",
"iss": "https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc/v2.0",
"aud": "87654321-4321-4321-4321-210987654321",
"exp": 1735689600,
"iat": 1735686000,
"tid": "12345678-1234-1234-1234-123456789abc",
"upn": "admin@company.com",
"unique_name": "admin@company.com",
"name": "John Admin",
"groups": [
"f1e2d3c4-b5a6-9780-1234-567890abcdef",
"a1b2c3d4-e5f6-7890-1234-567890abcdef"
],
"roles": [
"BoilStream.Admin",
"DataPlatform.User"
],
"scp": "User.Read GroupMember.Read.All"
}
Authorization Mapping
BoilStream maps Azure AD claims to authorization context:
Groups Mapping
# Azure AD groups -> BoilStream authorization
groups: ["f1e2d3c4-..."] -> Admin privileges (object ID)
groups: ["Data Analysts"] -> Read access (display name)
roles: ["BoilStream.Admin"] -> Admin privileges (app role)
Scopes Mapping
# OAuth scopes -> API permissions
scp: "boilstream.admin" -> Admin operations
scp: "boilstream.write" -> Write operations
scp: "boilstream.read" -> Read operations
Client Integration
Getting JWT Tokens
Use Microsoft Authentication Library (MSAL) to obtain JWT tokens:
from msal import ConfidentialClientApplication
# Configure MSAL
app = ConfidentialClientApplication(
client_id="87654321-4321-4321-4321-210987654321",
client_credential="your-client-secret",
authority="https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc"
)
# Get token using username/password (for testing only)
result = app.acquire_token_by_username_password(
username="admin@company.com",
password="YourPassword123!",
scopes=["87654321-4321-4321-4321-210987654321/.default"]
)
if "access_token" in result:
access_token = result["access_token"]
print(f"Bearer {access_token}")
const msal = require('@azure/msal-node');
const clientConfig = {
auth: {
clientId: '87654321-4321-4321-4321-210987654321',
clientSecret: 'your-client-secret',
authority: 'https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc'
}
};
const cca = new msal.ConfidentialClientApplication(clientConfig);
async function getToken() {
const clientCredentialRequest = {
scopes: ['87654321-4321-4321-4321-210987654321/.default'],
};
const response = await cca.acquireTokenByClientCredential(clientCredentialRequest);
console.log(`Bearer ${response.accessToken}`);
}
# Get token using Azure CLI
az account get-access-token \
--resource "87654321-4321-4321-4321-210987654321" \
--query "accessToken" \
--output tsv
Using Tokens with BoilStream
# Set the token
export TOKEN="eyJhbGciOiJSUzI1NiIs..."
# Use with DuckDB Airport extension
duckdb -s "
INSTALL airport FROM community;
LOAD airport;
SET custom_user_agent = 'Bearer ${TOKEN}';
ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://localhost:50051/');
"
Security Considerations
✅ Best Practices
- Use Specific Tenants: Use specific tenant IDs in production, not "common"
- Group Object IDs: Use group object IDs rather than display names for stability
- Conditional Access: Leverage Azure AD Conditional Access policies
- App Roles: Use Azure AD application roles for fine-grained permissions
⚠️ Security Warnings
- Multi-tenant Risks: Be cautious with
AZURE_TENANT_ID="common"
in production - Group Claims Limit: Azure AD limits group claims to 200 groups per token
- Token Validation: Always validate tenant ID in multi-tenant scenarios
Network Security
# Ensure BoilStream can reach Azure AD JWKS endpoint
curl -v "https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc/discovery/v2.0/keys"
# For multi-tenant, check common endpoint
curl -v "https://login.microsoftonline.com/common/discovery/v2.0/keys"
Troubleshooting
Common Issues
"Authentication failed" errors:
# Verify tenant and client IDs
az ad app show --id "87654321-4321-4321-4321-210987654321"
# Check JWKS endpoint
curl "https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc/discovery/v2.0/keys"
"Authorization denied" errors:
# Check user group membership
az ad user get-member-groups --id "admin@company.com"
# List all groups with their object IDs
az ad group list --query "[].{name:displayName, id:id}" -o table
Group claims missing from tokens:
# Verify optional claims configuration
az ad app show --id "87654321-4321-4321-4321-210987654321" --query "optionalClaims"
# Check if groups exceed 200 limit (use Microsoft Graph)
az rest --method GET --url "https://graph.microsoft.com/v1.0/me/memberOf" \
--headers "Authorization=Bearer $TOKEN"
Debug Logging
Enable detailed authentication logging:
export RUST_LOG="boilstream::auth::azure=debug,boilstream::auth::manager=debug"
Token Inspection
Decode JWT tokens to inspect claims:
# Decode token payload (requires jq)
echo "$TOKEN" | cut -d'.' -f2 | base64 -d | jq .
Advanced Configuration
Application Roles
Configure custom application roles in Azure AD:
- Go to Azure Portal → App registrations → BoilStream app
- Navigate to App roles → Create app role
- Configure roles like
DataAdmin
,DataProducer
,DataAnalyst
{
"allowedMemberTypes": ["User"],
"description": "Data platform administrators",
"displayName": "Data Admin",
"id": "12345678-1234-1234-1234-123456789abc",
"isEnabled": true,
"value": "DataAdmin"
}
Use roles in BoilStream configuration:
export ADMIN_GROUPS="DataAdmin"
export WRITE_GROUPS="DataProducer"
export READ_ONLY_GROUPS="DataAnalyst"
Custom Scopes
Create custom scopes for API access:
- Azure Portal → App registrations → Expose an API
- Add scopes like
boilstream.read
,boilstream.write
,boilstream.admin
# Use scope-based authorization
export REQUIRED_READ_SCOPES="boilstream.read"
export REQUIRED_WRITE_SCOPES="boilstream.write"
export REQUIRED_ADMIN_SCOPES="boilstream.admin"
Conditional Access Integration
Leverage Azure AD Conditional Access:
- Location-based: Restrict access by geographic location
- Device-based: Require managed devices
- Risk-based: Block risky sign-ins automatically
- MFA: Require multi-factor authentication
BoilStream respects Conditional Access policies automatically through token validation.
Microsoft 365 Integration
SharePoint/OneDrive Integration
Combine BoilStream with Microsoft 365 data sources:
-- Example: Stream SharePoint list data
COPY (
SELECT * FROM read_csv('sharepoint_export.csv')
) TO 'boilstream.s3.sharepoint_data';
Power BI Integration
Use BoilStream data in Power BI:
- Configure BoilStream with Azure AD SSO
- Query S3 data lake from Power BI
- Users authenticate once across both platforms
Next Steps
- Google Cloud Integration - Add Google Workspace support
- AWS Cognito Integration - Add AWS identity support
- Troubleshooting Guide - Debug authentication issues