Skip to main content
BI 0→1 Framework/Step 2: Data Ingestion

Step 2: Data Ingestion

Connect to data sources, validate data quality, and establish reliable ingestion patterns.

Chicago Open Data Analysis

Discovery: Chicago maintains 600+ datasets through their Open Data Portal, but only 12 were relevant for small business analysis.

Key Datasets Identified:

  • Business Licenses - 200K+ active licenses, updated daily
  • Building Permits - Construction and renovation permits
  • Business License Applications - New business tracking
  • Zoning Data - Commercial district boundaries

API Integration Strategy

Socrata Open Data API (SODA)

Chicago uses Socrata for their data portal. Key implementation details:

# Basic API exploration
import requests
import pandas as pd

# Business Licenses endpoint
base_url = "https://data.cityofchicago.org/resource/"
business_licenses = "r5kz-chrr.json"

# Test API connection and data structure
response = requests.get(f"{base_url}{business_licenses}",
                       params={"$limit": 10})

# Examine data schema
sample_data = response.json()
print(f"Columns: {list(sample_data[0].keys())}")
print(f"Record count: {len(sample_data)}")

✅ API Advantages

  • • Real-time data access
  • • SQL-like filtering ($where)
  • • JSON response format
  • • No authentication required

⚠️ API Limitations

  • • 1000 record default limit
  • • Rate limiting (unclear docs)
  • • Inconsistent data types
  • • No change logs

Framework Progress

Scope & Strategy
Data Ingestion
3
Transform & Model
4
Load & Validate
5
Visualize & Report
6
Automate & Scale

Key Lesson

Spending 1 week on data quality assessment prevented 3 weeks of debugging later. Always validate data assumptions before building pipelines.

Tools & Libraries

Python RequestsPandasSocrata APIData Profiling