Skip to content

Azure Active Directory (Entra ID)

Azure Active Directory provides enterprise identity and access management for BoilStream. This guide covers integrating with Azure AD for JWT authentication and Microsoft 365 SSO.

Overview

Azure AD integration provides:

  • Enterprise SSO: Seamless integration with Microsoft 365 and Azure
  • Multi-tenant Support: Support users from multiple Azure AD tenants
  • Group-based Authorization: Map Azure AD groups and roles to BoilStream permissions
  • Conditional Access: Leverage Azure AD security policies
  • High Availability: Microsoft-managed service with enterprise SLA

Prerequisites

  • Azure AD tenant with appropriate permissions
  • Azure CLI or Azure Portal access
  • App registration in Azure AD
  • Users and groups configured in Azure AD

Azure AD Setup

1. Create App Registration

Register BoilStream as an application in Azure AD:

bash
# Create app registration
az ad app create \
    --display-name "BoilStream Data Platform" \
    --sign-in-audience "AzureADMyOrg" \
    --web-redirect-uris "https://boilstream.company.com/auth/callback"

# Get the application ID
az ad app list --display-name "BoilStream Data Platform" --query "[0].appId"
json
{
  "displayName": "BoilStream Data Platform",
  "signInAudience": "AzureADMyOrg",
  "web": {
    "redirectUris": [
      "https://boilstream.company.com/auth/callback"
    ]
  },
  "requiredResourceAccess": [
    {
      "resourceAppId": "00000003-0000-0000-c000-000000000000",
      "resourceAccess": [
        {
          "id": "e1fe6dd8-ba31-4d61-89e7-88639da4683d",
          "type": "Scope"
        }
      ]
    }
  ]
}

2. Configure API Permissions

Grant necessary permissions for group and user information:

bash
# Add Microsoft Graph permissions
az ad app permission add \
    --id "12345678-1234-1234-1234-123456789abc" \
    --api "00000003-0000-0000-c000-000000000000" \
    --api-permissions "e1fe6dd8-ba31-4d61-89e7-88639da4683d=Scope"

# Grant admin consent
az ad app permission admin-consent \
    --id "12345678-1234-1234-1234-123456789abc"

Required Permissions:

  • User.Read - Read user profile
  • GroupMember.Read.All - Read group memberships (optional, for groups in tokens)

3. Configure Token Settings

Enable group claims in ID tokens:

bash
# Update app to include groups in tokens
az ad app update \
    --id "12345678-1234-1234-1234-123456789abc" \
    --optional-claims '{
        "idToken": [
            {
                "name": "groups",
                "essential": false
            }
        ],
        "accessToken": [
            {
                "name": "groups", 
                "essential": false
            }
        ]
    }'

4. Create Security Groups

Create groups for authorization mapping:

bash
# Create admin group
az ad group create \
    --display-name "BoilStream Admins" \
    --mail-nickname "boilstream-admins" \
    --description "BoilStream platform administrators"

# Create data producer group
az ad group create \
    --display-name "Data Producers" \
    --mail-nickname "data-producers" \
    --description "Users who can write data to BoilStream"

# Create data analyst group  
az ad group create \
    --display-name "Data Analysts" \
    --mail-nickname "data-analysts" \
    --description "Users who can read data from BoilStream"

5. Add Users to Groups

bash
# Get user and group object IDs
USER_ID=$(az ad user show --id "admin@company.com" --query "id" -o tsv)
GROUP_ID=$(az ad group show --group "BoilStream Admins" --query "id" -o tsv)

# Add user to group
az ad group member add --group "$GROUP_ID" --member-id "$USER_ID"

BoilStream Configuration

Environment Variables

Configure BoilStream for Azure AD authentication:

bash
# Enable Azure AD authentication
export AUTH_PROVIDERS="azure-ad"

# Azure AD configuration (REQUIRED)
export AZURE_TENANT_ID="12345678-1234-1234-1234-123456789abc"
export AZURE_CLIENT_ID="87654321-4321-4321-4321-210987654321"

# Optional: Allow multi-tenant (default: false)
export AZURE_ALLOW_MULTI_TENANT="false"

# Authorization groups (use Azure AD group object IDs or display names)
export ADMIN_GROUPS="f1e2d3c4-b5a6-9780-1234-567890abcdef"
export WRITE_GROUPS="a1b2c3d4-e5f6-7890-1234-567890abcdef,data-producers"
export READ_ONLY_GROUPS="Data Analysts,business-users"

Multi-tenant Configuration

For multi-tenant scenarios (supporting users from multiple Azure AD tenants):

bash
# Use "common" tenant for multi-tenant support
export AZURE_TENANT_ID="common"
export AZURE_ALLOW_MULTI_TENANT="true"

# Still use your specific client ID
export AZURE_CLIENT_ID="87654321-4321-4321-4321-210987654321"

Docker Compose Example

yaml
version: '3.8'
services:
  boilstream:
    image: boilstream:latest
    environment:
      # Azure AD Authentication
      AUTH_PROVIDERS: "azure-ad"
      AZURE_TENANT_ID: "12345678-1234-1234-1234-123456789abc"
      AZURE_CLIENT_ID: "87654321-4321-4321-4321-210987654321"
      AZURE_ALLOW_MULTI_TENANT: "false"
      
      # Authorization (using group object IDs)
      ADMIN_GROUPS: "f1e2d3c4-b5a6-9780-1234-567890abcdef"
      WRITE_GROUPS: "a1b2c3d4-e5f6-7890-1234-567890abcdef"
      READ_ONLY_GROUPS: "b2c3d4e5-f6a7-8901-2345-678901bcdefg"
      
      # Other BoilStream config...
      S3_BUCKET: "my-data-lake"
      AWS_REGION: "us-east-1"

JWT Token Claims

BoilStream extracts the following claims from Azure AD JWT tokens:

Standard Claims

  • sub - User object ID
  • iss - Issuer (Azure AD tenant)
  • aud - Audience (Client ID)
  • exp - Expiration timestamp
  • iat - Issued at timestamp
  • tid - Tenant ID

Azure AD-Specific Claims

  • groups - Array of group object IDs (if configured)
  • roles - Array of application role assignments
  • scp - Space-separated OAuth scopes
  • upn - User Principal Name
  • unique_name - Username
  • name - Display name

Example JWT Claims

json
{
  "sub": "f1e2d3c4-b5a6-9780-1234-567890abcdef",
  "iss": "https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc/v2.0",
  "aud": "87654321-4321-4321-4321-210987654321",
  "exp": 1735689600,
  "iat": 1735686000,
  "tid": "12345678-1234-1234-1234-123456789abc",
  "upn": "admin@company.com",
  "unique_name": "admin@company.com",
  "name": "John Admin",
  "groups": [
    "f1e2d3c4-b5a6-9780-1234-567890abcdef",
    "a1b2c3d4-e5f6-7890-1234-567890abcdef"
  ],
  "roles": [
    "BoilStream.Admin",
    "DataPlatform.User"
  ],
  "scp": "User.Read GroupMember.Read.All"
}

Authorization Mapping

BoilStream maps Azure AD claims to authorization context:

Groups Mapping

bash
# Azure AD groups -> BoilStream authorization
groups: ["f1e2d3c4-..."] -> Admin privileges (object ID)
groups: ["Data Analysts"] -> Read access (display name)
roles: ["BoilStream.Admin"] -> Admin privileges (app role)

Scopes Mapping

bash
# OAuth scopes -> API permissions  
scp: "boilstream.admin" -> Admin operations
scp: "boilstream.write" -> Write operations
scp: "boilstream.read"  -> Read operations

Client Integration

Getting JWT Tokens

Use Microsoft Authentication Library (MSAL) to obtain JWT tokens:

python
from msal import ConfidentialClientApplication

# Configure MSAL
app = ConfidentialClientApplication(
    client_id="87654321-4321-4321-4321-210987654321",
    client_credential="your-client-secret",
    authority="https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc"
)

# Get token using username/password (for testing only)
result = app.acquire_token_by_username_password(
    username="admin@company.com",
    password="YourPassword123!",
    scopes=["87654321-4321-4321-4321-210987654321/.default"]
)

if "access_token" in result:
    access_token = result["access_token"]
    print(f"Bearer {access_token}")
javascript
const msal = require('@azure/msal-node');

const clientConfig = {
    auth: {
        clientId: '87654321-4321-4321-4321-210987654321',
        clientSecret: 'your-client-secret',
        authority: 'https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc'
    }
};

const cca = new msal.ConfidentialClientApplication(clientConfig);

async function getToken() {
    const clientCredentialRequest = {
        scopes: ['87654321-4321-4321-4321-210987654321/.default'],
    };

    const response = await cca.acquireTokenByClientCredential(clientCredentialRequest);
    console.log(`Bearer ${response.accessToken}`);
}
bash
# Get token using Azure CLI
az account get-access-token \
    --resource "87654321-4321-4321-4321-210987654321" \
    --query "accessToken" \
    --output tsv

Using Tokens with BoilStream

bash
# Set the token
export TOKEN="eyJhbGciOiJSUzI1NiIs..."

# Use with DuckDB Airport extension
duckdb -s "
INSTALL airport FROM community; 
LOAD airport;
SET custom_user_agent = 'Bearer ${TOKEN}';
ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://localhost:50051/');
"

Security Considerations

✅ Best Practices

  • Use Specific Tenants: Use specific tenant IDs in production, not "common"
  • Group Object IDs: Use group object IDs rather than display names for stability
  • Conditional Access: Leverage Azure AD Conditional Access policies
  • App Roles: Use Azure AD application roles for fine-grained permissions

⚠️ Security Warnings

  • Multi-tenant Risks: Be cautious with AZURE_TENANT_ID="common" in production
  • Group Claims Limit: Azure AD limits group claims to 200 groups per token
  • Token Validation: Always validate tenant ID in multi-tenant scenarios

Network Security

bash
# Ensure BoilStream can reach Azure AD JWKS endpoint
curl -v "https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc/discovery/v2.0/keys"

# For multi-tenant, check common endpoint
curl -v "https://login.microsoftonline.com/common/discovery/v2.0/keys"

Troubleshooting

Common Issues

"Authentication failed" errors:

bash
# Verify tenant and client IDs
az ad app show --id "87654321-4321-4321-4321-210987654321"

# Check JWKS endpoint
curl "https://login.microsoftonline.com/12345678-1234-1234-1234-123456789abc/discovery/v2.0/keys"

"Authorization denied" errors:

bash
# Check user group membership
az ad user get-member-groups --id "admin@company.com"

# List all groups with their object IDs
az ad group list --query "[].{name:displayName, id:id}" -o table

Group claims missing from tokens:

bash
# Verify optional claims configuration
az ad app show --id "87654321-4321-4321-4321-210987654321" --query "optionalClaims"

# Check if groups exceed 200 limit (use Microsoft Graph)
az rest --method GET --url "https://graph.microsoft.com/v1.0/me/memberOf" \
    --headers "Authorization=Bearer $TOKEN"

Debug Logging

Enable detailed authentication logging:

bash
export RUST_LOG="boilstream::auth::azure=debug,boilstream::auth::manager=debug"

Token Inspection

Decode JWT tokens to inspect claims:

bash
# Decode token payload (requires jq)
echo "$TOKEN" | cut -d'.' -f2 | base64 -d | jq .

Advanced Configuration

Application Roles

Configure custom application roles in Azure AD:

  1. Go to Azure Portal → App registrations → BoilStream app
  2. Navigate to App roles → Create app role
  3. Configure roles like DataAdmin, DataProducer, DataAnalyst
json
{
  "allowedMemberTypes": ["User"],
  "description": "Data platform administrators",
  "displayName": "Data Admin",
  "id": "12345678-1234-1234-1234-123456789abc",
  "isEnabled": true,
  "value": "DataAdmin"
}

Use roles in BoilStream configuration:

bash
export ADMIN_GROUPS="DataAdmin"
export WRITE_GROUPS="DataProducer"
export READ_ONLY_GROUPS="DataAnalyst"

Custom Scopes

Create custom scopes for API access:

  1. Azure Portal → App registrations → Expose an API
  2. Add scopes like boilstream.read, boilstream.write, boilstream.admin
bash
# Use scope-based authorization
export REQUIRED_READ_SCOPES="boilstream.read"
export REQUIRED_WRITE_SCOPES="boilstream.write"
export REQUIRED_ADMIN_SCOPES="boilstream.admin"

Conditional Access Integration

Leverage Azure AD Conditional Access:

  • Location-based: Restrict access by geographic location
  • Device-based: Require managed devices
  • Risk-based: Block risky sign-ins automatically
  • MFA: Require multi-factor authentication

BoilStream respects Conditional Access policies automatically through token validation.

Microsoft 365 Integration

SharePoint/OneDrive Integration

Combine BoilStream with Microsoft 365 data sources:

sql
-- Example: Stream SharePoint list data
COPY (
    SELECT * FROM read_csv('sharepoint_export.csv')
) TO 'boilstream.s3.sharepoint_data';

Power BI Integration

Use BoilStream data in Power BI:

  1. Configure BoilStream with Azure AD SSO
  2. Query S3 data lake from Power BI
  3. Users authenticate once across both platforms

Next Steps