Understanding our permit prediction

Chris

24 Sep 2025 — 2 min read

Our goal is to help you find good jobs that are about to come up. Essentially, this is a problem of prediction. And to do that well, we must first understand how the process works, especially for new construction. Here, we provide a non-technical overview of our approach, with technical details in the appendix.

New construction is a great way to build the intuition: Let's look at all the lots were a filing was approved for 2022, and we see that, across the major permit categories, these lots (called "New") have 2x to 8x more permits than an average lot (called "Baseline"). This is part of the data we show on our homepage:

One way to think of this: Defining leads just based on new construction is a good start, but new leads are much more predictive for some work types than others.

We then build on this by using a much wider range of data to predict permits:

Location
Filing characteristics (unit size, etc)
Lot characteristics

We use a standard predictive model, and then validate the results with data from the following year. To fairly compare results, we say: "Create just as many leads as using the "new construction" methods. And it turns out that those new leads perform much, much better, outperforming the "new" method by a factor of 6x to 10x, and the baseline by 22x to 51x.

Appendix

Dataset

Our master dataset has one row per tax-lot and year (2021-2025), which we call lot-year. For each row, we merge on filing data from the lot-year and permit data from the next year, so that permits in year (n+1) can be predicted from filing data in year (n).

Approximate data size:

Tax Lots: 850K
Filings: 381K (DOB + DOB Now)
Permits: 599K (DOB NOW, 2021+)

We then run a LightGBM model in Python, with parameters below.

params = {
    'objective': 'regression',
    'metric': 'rmse',
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

We fully separate training and test data. To evaluate a model trained with data from year n, we apply this prediction to a dataset from year n+1.

Our key performance metric is the average number of permits among leads, in the test sample.

We're comparing performance from three methods, all using the test data:

Baseline: The average number of permits (AP) in the sample
New: AP among the tax-lots with an approved filing in that year. We call the number of such lots N
Model: AP among the N tax-lots with the highest predicted model score.

We are fixing the number of leads between the methods (2) and (3) to run a fair comparison, to ensure AP is not driven by the number of leads, but only their quality.

City of Yes and Transit-Oriented Development: 23,000 lots got bigger

On December 5, 2024, the NYC city council passed the City of Yes (COY) rezoning plan, the largest zoning change in many decades, which aims to create 80,000 new units over the next decade. This rezoning changed so much that many ideas got completely lost in the press coverage.

The City Tracker Data model

City Tracker is an API-first company. In this post, we cover key technical features of our API. Endpoints in production (status 10/29/2025): * tax_lots * permits In progress: * filings * companies These API endpoints are as consistent as possible: Each of them returns a list of results satisfying filter criteria

Understanding permits for new development

As part of our series to predict permitting, we zoom in on new construction. A crucial part of predicting is to understand the DOB permitting process: It starts with a filing. If approved, a single filing allows pulling possibly many permits, each with their own issuance and expiration date. For

There's one key metric predicting NYC development

We know where new residential development is happening. But it turns out that this, usually, isn't what you want to know. What you really need is to know where new development will happen. Otherwise, you're just chasing the past. At CityTracker, we've developed a