FlightSQL API
BoilStream implements the FlightSQL protocol for SQL-based data access, enabling BI tools and analytical applications to query data using standard SQL over the Arrow Flight protocol.
Overview
FlightSQL provides:
- Standard SQL interface over Arrow Flight protocol
- BI tool compatibility for tools supporting FlightSQL
- High-performance data transfer using Apache Arrow format
- Metadata discovery for catalogs, schemas, and tables
- Prepared statements with parameter binding
Connection Details
Protocol: grpc or grpc+tls
Port: 50250 (consumer flight port)
Authentication: JWT Bearer token (Pro version)
Supported Operations
Query Execution
Statement Query
Execute SQL queries and retrieve results:
sql
-- Simple query
SELECT * FROM topic_12345 WHERE timestamp > NOW() - INTERVAL 1 HOUR;
-- Aggregation
SELECT COUNT(*), AVG(value) FROM topic_events;
-- Joins across topics
SELECT t1.*, t2.status
FROM topic_orders t1
JOIN topic_users t2 ON t1.user_id = t2.id;
Prepared Statements
Prepare and execute parameterized queries:
- Create prepared statement with placeholders
- Bind parameters with actual values
- Execute the prepared query
- Close the prepared statement when done
Metadata Discovery
Get Catalogs
Retrieve available database catalogs:
- Returns list of database names
- Used by BI tools for schema discovery
Get DB Schemas
List schemas within catalogs:
- Filters by catalog name (optional)
- Returns schema names with associated catalog
Get Tables
Discover available tables:
- Filter by catalog, schema, table name pattern
- Optional table type filtering
- Returns table metadata including:
- Catalog name
- Schema name
- Table name
- Table type (TABLE, TEMPORARY TABLE, VIEW)
- Optional serialized Arrow schema (when
include_schema=true
)
Get Table Types
List supported table types:
- TABLE
- TEMPORARY TABLE
- VIEW
SQL Info
Query server capabilities and configuration:
- Supported SQL features
- Data type mappings
- Server version information
Authentication
Pro Version
JWT Bearer token authentication:
http
Authorization: Bearer <jwt_token>
Free Version
No authentication required for the free version.
Client Examples
Python with PyArrow
python
import pyarrow.flight as flight
import pyarrow.flight.sql as flightsql
# Connect to FlightSQL server
location = flight.Location.for_grpc_tcp("localhost", 50250)
client = flightsql.FlightSqlClient(location)
# Execute a query
query = "SELECT * FROM topic_events LIMIT 100"
flight_info = client.execute(query)
# Fetch results
reader = client.do_get(flight_info.endpoints[0].ticket)
table = reader.read_all()
print(table.to_pandas())
# Prepared statement example
statement = client.prepare("SELECT * FROM topic_events WHERE value > ?")
statement.set_parameters([100])
flight_info = statement.execute()
reader = client.do_get(flight_info.endpoints[0].ticket)
results = reader.read_all()
JDBC Driver
java
import java.sql.*;
import org.apache.arrow.flight.sql.FlightSqlDriver;
// Register FlightSQL JDBC driver
Class.forName("org.apache.arrow.flight.sql.FlightSqlDriver");
// Connect
String url = "jdbc:arrow-flight-sql://localhost:50250";
Properties props = new Properties();
props.setProperty("user", "username");
props.setProperty("password", "token");
Connection conn = DriverManager.getConnection(url, props);
// Execute query
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM topic_events");
while (rs.next()) {
System.out.println(rs.getString("event_type"));
}
Data Type Mapping
FlightSQL uses Arrow data types which map to SQL types:
SQL Type | Arrow Type | Notes |
---|---|---|
VARCHAR | Utf8 | Variable-length strings |
INTEGER | Int32 | 32-bit integers |
BIGINT | Int64 | 64-bit integers |
DOUBLE | Float64 | Double precision |
TIMESTAMP | Timestamp | With timezone support |
BOOLEAN | Bool | True/false values |
BINARY | Binary | Raw bytes |
Query Limitations
- Read-only access: No INSERT/UPDATE/DELETE support via FlightSQL
- DuckDB SQL dialect: Queries use DuckDB SQL syntax
- Streaming results: Large result sets are streamed efficiently
- Timeout handling: Long-running queries have configurable timeouts
Performance Considerations
Efficient Querying
- Use filters to reduce data transfer
- Leverage DuckDB's columnar processing
- Results are streamed as Arrow RecordBatches
- Zero-copy data transfer where possible
Connection Pooling
- Reuse FlightSQL clients for multiple queries
- Prepared statements cache query plans
- Connection overhead is minimal with gRPC
Error Handling
Common error codes:
Status | Description |
---|---|
INVALID_ARGUMENT | Malformed SQL or invalid parameters |
UNAUTHENTICATED | Missing or invalid authentication |
PERMISSION_DENIED | Insufficient permissions for query |
NOT_FOUND | Table or schema doesn't exist |
INTERNAL | Server error during query execution |
Best Practices
- Use prepared statements for repeated queries with different parameters
- Close resources properly (prepared statements, result readers)
- Handle schema evolution - table schemas may change over time
- Set appropriate timeouts for long-running analytical queries
- Use connection pooling in production applications
Compatibility
FlightSQL in BoilStream is compatible with:
- Apache Arrow Flight SQL specification
- ADBC (Arrow Database Connectivity) drivers
- BI tools with FlightSQL connectors
- Custom applications using Arrow Flight SQL libraries
The implementation focuses on read-only analytical queries, making it ideal for:
- Business intelligence tools
- Data exploration
- Reporting applications
- Real-time dashboards