Data Analysis Tools

The worker containers come pre-installed with a comprehensive set of tools for navigating and analyzing civil engineering project data.

Python Libraries

A Python 3 virtual environment (/opt/venv) is available with the following packages:

IFC Models

Package	Purpose
IfcOpenShell	Read and query IFC models (supports IFC2X3 and IFC4 schemas)

Example agent usage:

python

import ifcopenshell

model = ifcopenshell.open("/data/PID_Karavanke/.../model.ifc")
elements = model.by_type("IfcBuildingElementProxy")

for el in elements:
    psets = ifcopenshell.util.element.get_psets(el)
    klasifikacija = psets.get("KAR_Klasifikacija", {})
    print(klasifikacija.get("ElementTip"), klasifikacija.get("Funkcija"))

IFC Schema Versions

The Karavanke dataset contains two IFC schemas: IFC2X3 (209 files) and IFC4 (64 files). IfcOpenShell handles both transparently.

Excel Tables

Package	Purpose
openpyxl	Read and write Excel files (.xlsx)
pandas	Tabular data analysis, filtering, aggregation

Example agent usage:

python

import pandas as pd

df = pd.read_excel("/data/.../ListaKampad_Elea.xlsx", sheet_name="Blockbuch")
kpp = df[df["Tip kampade"] == "KPP"]
print(f"Found {len(kpp)} KPP kampadas")

PDF Documents

Package	Purpose
pdfplumber	Extract text and tables from PDF files
PyMuPDF (fitz)	Fast PDF rendering, text extraction, and OCR support
pytesseract	OCR engine for scanned documents

The system includes Tesseract OCR for handling scanned PDFs that don't contain selectable text.

Example agent usage:

python

import pdfplumber

with pdfplumber.open("/data/.../technical_report.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        tables = page.extract_tables()

System Tools

The following command-line tools are available for fast data navigation:

Tool	Purpose	Example
ripgrep (`rg`)	Fast text search across files	`rg "kampada A271" /data/ --type pdf`
jq	JSON processing and filtering	`jq '.elements[] \| .name' output.json`
sqlite3	SQLite database queries	`sqlite3 /artefacts/state/session_registry.db '.tables'`
git	Version control (for agent workspace)	`git log --oneline`

ripgrep for Fast File Discovery

ripgrep is particularly useful for quickly finding relevant files across the 1,672-file dataset:

bash

# Find all files mentioning a specific kampada
rg -l "A271" /data/

# Search for a term in PDF-extracted text
rg -l "stropna plosca" /data/ --type-add 'txt:*.txt'

# Count occurrences across file types
rg -c "KPP" /data/ --type-add 'xlsx:*.xlsx'

Tool Availability by Container

Tool	Controller	Worker
Node.js 22	Yes	Yes
Python 3 + venv	No	Yes
IfcOpenShell	No	Yes
pandas, openpyxl	No	Yes
pdfplumber, PyMuPDF	No	Yes
pytesseract + Tesseract	No	Yes
ripgrep, jq, sqlite3	Yes	Yes
git	Yes	Yes

INFO

All AI query execution happens on workers, which have the full toolchain. The controller only serves the web UI and routes requests.

Data Analysis Tools ​

Python Libraries ​

IFC Models ​

Excel Tables ​

PDF Documents ​

System Tools ​

ripgrep for Fast File Discovery ​

Tool Availability by Container ​

Data Analysis Tools

Python Libraries

IFC Models

Excel Tables

PDF Documents

System Tools

ripgrep for Fast File Discovery

Tool Availability by Container