Skip to content

Google Cloud Platform Authentication

Google Cloud Platform provides enterprise identity and access management for BoilStream through Google Cloud Identity, Workspace, and Identity and Access Management (IAM).

Overview

GCP integration provides:

  • Google Workspace SSO: Integration with Google Workspace for enterprise users
  • Cloud Identity: Centralized identity management for GCP resources
  • Service Token Support: Both user identity tokens and STS (Security Token Service) tokens
  • Custom Claims: Flexible group and role mapping through custom JWT claims
  • Workspace Domain Validation: Restrict access to specific Google Workspace domains

Prerequisites

  • Google Cloud Project with appropriate permissions
  • Google Workspace domain (for enterprise users) or Cloud Identity setup
  • OAuth 2.0 credentials configured in Google Cloud Console
  • Users and groups configured in Google Workspace or Cloud Identity

Google Cloud Setup

1. Create OAuth 2.0 Credentials

Configure OAuth credentials in Google Cloud Console:

bash
# Enable required APIs
gcloud services enable iamcredentials.googleapis.com
gcloud services enable cloudidentity.googleapis.com

# Create OAuth 2.0 client (requires manual setup in console)
echo "Visit Google Cloud Console to create OAuth 2.0 credentials:"
echo "https://console.cloud.google.com/apis/credentials"
json
{
  "type": "web_application",
  "name": "BoilStream Data Platform",
  "authorized_redirect_uris": [
    "https://boilstream.company.com/auth/callback"
  ],
  "authorized_javascript_origins": [
    "https://boilstream.company.com"
  ]
}

Manual Steps in Google Cloud Console:

  1. Navigate to APIs & Services → Credentials
  2. Click "Create Credentials" → "OAuth 2.0 Client ID"
  3. Select "Web application"
  4. Add authorized redirect URIs for your BoilStream deployment
  5. Note the Client ID for configuration

2. Configure Workspace Groups

Create groups in Google Workspace Admin Console:

bash
# Groups to create in Google Workspace:
# - boilstream-admins@company.com
# - data-producers@company.com  
# - data-analysts@company.com

Google Workspace Admin Console Steps:

  1. Go to admin.google.com
  2. Navigate to Directory → Groups
  3. Create groups for BoilStream authorization
  4. Add users to appropriate groups

3. Service Account (Optional)

For service-to-service authentication, create a service account:

bash
# Create service account
gcloud iam service-accounts create boilstream-service \
    --description="BoilStream service account" \
    --display-name="BoilStream Service"

# Create and download key
gcloud iam service-accounts keys create boilstream-key.json \
    --iam-account=boilstream-service@PROJECT_ID.iam.gserviceaccount.com

BoilStream Configuration

Environment Variables

Configure BoilStream for GCP authentication:

bash
# Enable GCP authentication
export AUTH_PROVIDERS="gcp"

# GCP configuration (at least one required)
export GCP_CLIENT_ID="123456789-abc.apps.googleusercontent.com"
export GCP_PROJECT_ID="my-project-123"

# Optional: Workspace domain restriction
export GCP_REQUIRE_WORKSPACE_DOMAIN="company.com"

# Optional: Allow STS tokens (default: true)
export GCP_ALLOW_STS_TOKENS="true"

# Optional: Custom claim mapping
export GCP_GROUPS_CLAIM="groups"
export GCP_ROLES_CLAIM="roles"

# Authorization groups (use email addresses or custom claims)
export ADMIN_GROUPS="boilstream-admins@company.com"
export WRITE_GROUPS="data-producers@company.com,etl-services"
export READ_ONLY_GROUPS="data-analysts@company.com,business-users"

Minimal Configuration

BoilStream requires either GCP_CLIENT_ID or GCP_PROJECT_ID:

bash
# Option 1: Google identity tokens only
export AUTH_PROVIDERS="gcp"
export GCP_CLIENT_ID="123456789-abc.apps.googleusercontent.com"
export GCP_REQUIRE_WORKSPACE_DOMAIN="company.com"

# Option 2: STS tokens only  
export AUTH_PROVIDERS="gcp"
export GCP_PROJECT_ID="my-project-123"
export GCP_ALLOW_STS_TOKENS="true"

# Option 3: Both (recommended)
export AUTH_PROVIDERS="gcp"
export GCP_CLIENT_ID="123456789-abc.apps.googleusercontent.com"
export GCP_PROJECT_ID="my-project-123"

Docker Compose Example

yaml
version: '3.8'
services:
  boilstream:
    image: boilstream:latest
    environment:
      # GCP Authentication
      AUTH_PROVIDERS: "gcp"
      GCP_CLIENT_ID: "123456789-abc.apps.googleusercontent.com"
      GCP_PROJECT_ID: "my-project-123"
      GCP_REQUIRE_WORKSPACE_DOMAIN: "company.com"
      GCP_ALLOW_STS_TOKENS: "true"
      
      # Custom claim mapping
      GCP_GROUPS_CLAIM: "groups"
      GCP_ROLES_CLAIM: "roles"
      
      # Authorization
      ADMIN_GROUPS: "boilstream-admins@company.com"
      WRITE_GROUPS: "data-producers@company.com"
      READ_ONLY_GROUPS: "data-analysts@company.com"
      
      # Other BoilStream config...
      S3_BUCKET: "my-data-lake"
      AWS_REGION: "us-east-1"

JWT Token Claims

BoilStream supports two types of GCP tokens:

Google Identity Tokens (accounts.google.com)

Standard Google identity tokens from OAuth flows:

json
{
  "sub": "123456789012345678901",
  "iss": "https://accounts.google.com",
  "aud": "123456789-abc.apps.googleusercontent.com",
  "exp": 1735689600,
  "iat": 1735686000,
  "email": "admin@company.com",
  "email_verified": true,
  "name": "John Admin",
  "hd": "company.com",
  "groups": [
    "boilstream-admins@company.com",
    "data-engineers@company.com"
  ]
}

Google STS Tokens (sts.googleapis.com)

Service Token Service tokens for service accounts:

json
{
  "sub": "123456789012345678901",
  "iss": "https://sts.googleapis.com",
  "aud": "//iam.googleapis.com/projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider",
  "exp": 1735689600,
  "iat": 1735686000,
  "google": {
    "compute_engine": {
      "project_id": "my-project-123",
      "zone": "us-central1-a",
      "instance_id": "1234567890123456789"
    }
  }
}

Custom Claims

BoilStream supports configurable custom claims:

bash
# Configure custom claim names
export GCP_GROUPS_CLAIM="company_groups"
export GCP_ROLES_CLAIM="company_roles"
json
{
  "sub": "123456789012345678901",
  "iss": "https://accounts.google.com", 
  "aud": "123456789-abc.apps.googleusercontent.com",
  "email": "admin@company.com",
  "company_groups": [
    "platform-admins",
    "data-engineers"
  ],
  "company_roles": [
    "BoilStreamAdmin",
    "DataPlatformUser"
  ]
}

Authorization Mapping

BoilStream maps GCP claims to authorization context:

Google Workspace Integration

bash
# Workspace groups -> BoilStream authorization
groups: ["boilstream-admins@company.com"] -> Admin privileges
groups: ["data-producers@company.com"]    -> Write access
groups: ["data-analysts@company.com"]     -> Read access

Custom Claims Mapping

bash
# Custom claims -> BoilStream authorization
company_groups: ["platform-admins"] -> Admin privileges
company_roles: ["DataProducer"]     -> Write access

Domain Validation

When GCP_REQUIRE_WORKSPACE_DOMAIN is set, BoilStream validates the hd (hosted domain) claim:

json
{
  "hd": "company.com",  // Must match GCP_REQUIRE_WORKSPACE_DOMAIN
  "email": "user@company.com"
}

Client Integration

Getting JWT Tokens

Use Google Auth libraries to obtain JWT tokens:

python
from google.auth.transport.requests import Request
from google.oauth2 import id_token
import google.auth

# Option 1: Service account
credentials, project = google.auth.default()
credentials.refresh(Request())

# Get ID token for service account
target_audience = "123456789-abc.apps.googleusercontent.com"
token = id_token.fetch_id_token(Request(), target_audience)
print(f"Bearer {token}")

# Option 2: User OAuth flow (requires web setup)
from google_auth_oauthlib.flow import Flow

flow = Flow.from_client_config(
    {
        "web": {
            "client_id": "123456789-abc.apps.googleusercontent.com",
            "client_secret": "your-secret",
            "auth_uri": "https://accounts.google.com/o/oauth2/auth",
            "token_uri": "https://oauth2.googleapis.com/token"
        }
    },
    scopes=["openid", "email", "profile"]
)

# Complete OAuth flow to get ID token
javascript
const { GoogleAuth } = require('google-auth-library');

async function getToken() {
    const auth = new GoogleAuth({
        scopes: ['https://www.googleapis.com/auth/cloud-platform']
    });
    
    // Get ID token
    const client = await auth.getIdTokenClient('123456789-abc.apps.googleusercontent.com');
    const token = await client.idTokenProvider.fetchIdToken('123456789-abc.apps.googleusercontent.com');
    
    console.log(`Bearer ${token}`);
}
bash
# Get identity token for current user
gcloud auth print-identity-token \
    --audiences="123456789-abc.apps.googleusercontent.com"

# Activate service account and get token
gcloud auth activate-service-account --key-file=boilstream-key.json
gcloud auth print-identity-token \
    --audiences="123456789-abc.apps.googleusercontent.com"

Using Tokens with BoilStream

bash
# Set the token
export TOKEN="eyJhbGciOiJSUzI1NiIs..."

# Use with DuckDB Airport extension
duckdb -s "
INSTALL airport FROM community; 
LOAD airport;
SET custom_user_agent = 'Bearer ${TOKEN}';
ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://localhost:50051/');
"

Security Considerations

✅ Best Practices

  • Workspace Domain: Always set GCP_REQUIRE_WORKSPACE_DOMAIN for corporate environments
  • Service Account Keys: Protect service account keys, use Workload Identity when possible
  • Token Scope: Use minimal scopes required for functionality
  • Custom Claims: Use custom claims for fine-grained authorization control

⚠️ Security Warnings

  • Public Google Accounts: Without domain restrictions, any Google account can authenticate
  • Token Validation: Always validate audience and issuer claims
  • STS Token Risks: STS tokens may have different security properties than user tokens

Network Security

bash
# Ensure BoilStream can reach Google JWKS endpoint
curl -v "https://www.googleapis.com/oauth2/v3/certs"

# Expected response: JSON with RSA public keys
{
  "keys": [
    {
      "kty": "RSA",
      "alg": "RS256", 
      "use": "sig",
      "kid": "abc123...",
      "n": "xyz789...",
      "e": "AQAB"
    }
  ]
}

Troubleshooting

Common Issues

"Authentication failed" errors:

bash
# Check client ID and project configuration
gcloud projects describe "my-project-123"

# Verify JWKS endpoint accessibility
curl "https://www.googleapis.com/oauth2/v3/certs"

"Authorization denied" errors:

bash
# Check Google Workspace group membership
# (requires admin access to Workspace)

# Verify custom claims in token
echo "$TOKEN" | cut -d'.' -f2 | base64 -d | jq .

Domain validation failures:

bash
# Check hosted domain claim in token
echo "$TOKEN" | cut -d'.' -f2 | base64 -d | jq .hd

# Verify domain configuration
echo $GCP_REQUIRE_WORKSPACE_DOMAIN

Debug Logging

Enable detailed authentication logging:

bash
export RUST_LOG="boilstream::auth::gcp=debug,boilstream::auth::manager=debug"

Token Inspection

bash
# Decode Google ID token
echo "$TOKEN" | cut -d'.' -f2 | base64 -d | jq .

# Check token with Google's tokeninfo endpoint
curl "https://oauth2.googleapis.com/tokeninfo?id_token=$TOKEN"

Advanced Configuration

Workload Identity

Use Workload Identity for GKE deployments:

yaml
# Kubernetes service account annotation
apiVersion: v1
kind: ServiceAccount
metadata:
  name: boilstream
  annotations:
    iam.gke.io/gcp-service-account: boilstream-service@PROJECT_ID.iam.gserviceaccount.com
bash
# Bind Kubernetes and Google service accounts
gcloud iam service-accounts add-iam-policy-binding \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/boilstream]" \
    boilstream-service@PROJECT_ID.iam.gserviceaccount.com

Custom Token Validation

For advanced use cases, implement custom token validation:

bash
# Configure custom issuer validation
export GCP_CUSTOM_ISSUER="https://your-custom-issuer.com"

# Use custom JWKS endpoint
export GCP_CUSTOM_JWKS_URL="https://your-domain.com/.well-known/jwks.json"

Directory API Integration

Integrate with Google Workspace Directory API for dynamic group lookup:

  1. Enable Directory API in Google Cloud Console
  2. Grant service account domain-wide delegation
  3. Configure custom claims to fetch groups dynamically
python
# Example: Fetch user groups from Directory API
from googleapiclient.discovery import build

service = build('admin', 'directory_v1', credentials=credentials)
groups = service.groups().list(domain='company.com').execute()

Google Workspace Integration

Gmail/Calendar Integration

Combine BoilStream with Google Workspace data:

sql
-- Example: Stream calendar events
COPY (
    SELECT * FROM read_json('calendar_export.json')
) TO 'boilstream.s3.calendar_events';

Google Sheets Integration

Stream Google Sheets data through BoilStream:

sql
-- Stream Google Sheets data
COPY (
    SELECT * FROM read_csv('sheets_export.csv')
) TO 'boilstream.s3.sheets_data';

Next Steps