▶ Run scrape

Scrapes StreetEasy + Zumper, scores listings, writes results.json. Then refresh /dashboard.

Computations and filtering

This document describes what the Apt Hunter pipeline and dashboard compute, and how each value is derived. Implementation lives mainly under pipeline/ (Node) and app/apartments/page.tsx (dashboard).

Pipeline flow (order of operations)

When you run node pipeline/index.js (or npm run scrape:dry), the orchestrator in pipeline/index.js roughly does:

1. Fetch — StreetEasy (pipeline/scraper.js) and Zumper (pipeline/zumper-scraper.js) in parallel; dedupe by listing id. 2. Seen filter — Unless --all, keep only listings whose id is not in .seen-listings.json (filterNew). After a successful non–dry-run, processed listings are marked seen (markSeen). 3. Enrich — If GOOGLE_MAPS_API_KEY is set (pipeline/enrich.js): geocode, transit minutes to commute destination(s), Places reviews/summary. If the key is missing, listings pass through unchanged (log warns). 4. Score — scoreListings or scoreListingsWithRoommate (pipeline/scorer.js). 5. Dismissed — Remove IDs in .dismissed.json (filterDismissed). 6. Availability filter — filterByAvailability (pipeline/availability.js) using profile.availableBy and a ±30 day window (see below). 7. Price history — Append today’s price per id to .price-history.json; price drops compare current price to the max price recorded on any prior calendar day (detectPriceDrops). 8. Neighborhood median context — computeMedianPriceContext adds per-listing fields vs the median rent in the current batch for that neighborhood string. 9. Detail scrape — Optional open house / broker fields (pipeline/detail-scraper.js), unless --dummy. 10. Write results.json — Top listings for the UI (see profile.delivery.topN; the file also caps stored rows for the dashboard).

--rescore re-reads existing results.json listings, re-stamps daysOnMarket from the seen store, and re-runs scoring + availability + median without scraping.

Scraper-level constraints (before scoring)

Source | How the search is constrained | |--------|-------------------------------| StreetEasy | Listing URL uses price:-{maxRent} and beds:1,2 (fixed 1–2 BR in code today), sort_by=listed_desc. | Zumper | Query uses max_price, min_beds / max_beds from profile.minBedrooms / profile.maxBedrooms, and no_fee=true. |

So bedroom coverage can differ by source: Zumper follows your profile; StreetEasy is always 1–2 BR in the current implementation.

Hard exclusions in `scoreListings`

These conditions drop the listing from scored results (no row in the pipeline output for that run):

Rule | Condition | |------|-----------| Over budget | Parsed or structured price > profile.maxRent. | Laundry requirement | requirements.laundry === 'in-building' and detected laundry is none (no in-unit / in-building signal in title, description, tags, or scraper field). |

Not hard-filtered in scoreListings (but affect must-have % and roommate checks):

Bedroom count — Must match minBedrooms–maxBedrooms for a full must-have set; wrong count still gets a total score today.

requirements.fullBath — Present in profile.json / TypeScript types but not referenced in scorer.js; no pipeline enforcement.

Move-in date — Handled in must-have % inside scoring and again in filterByAvailability after scoring (see Availability).

Total score (`score.total`)

The total is the sum of additive components minus penalties (clamped so the total does not go below 0 after a penalty). There is no fixed “out of 100” cap in code; logged “/100” in the CLI is informal.

Additive components

Component | Points | How it is calculated | |-----------|--------|----------------------| Laundry | 0–25 | in-unit → 25; in-building → 15; unknown → 8; none → 0 (but none + in-building requirement already excluded). Uses scraper listing.laundry if set and not unknown, else regex on full text (detectLaundry). | Elevator | 0 or 10 | 10 only if preferences.elevator is true and elevator detected (listing.elevator or detectElevator on text). | Neighborhood commute | 0–20 | From static table pipeline/neighborhoods.js: first alias match on neighborhood + text → hood.commute (authoring: “ease to Midtown/Flatiron-style commute”, not live routing). | Neighborhood quality | 0–20 | Same match → hood.quality. Unknown neighborhood → commute 8, quality 8. | Price value | 0–15 | priceValueScore(price, maxRent): ratio $r = price / maxRent$. r ≤ 0.6 → 15; ≤ 0.7 → 12; ≤ 0.8 → 9; ≤ 0.9 → 6; else 3. Missing price → 5. | Freshness | 0–10 | If daysOnMarket is set: 0 days → 10; 1 → 7; ≤3 → 5; ≤7 → 2; else 0. Else fallback from pubDate age in hours (under 2h→10, under 6h→8, under 12h→6, under 24h→4, under 48h→2, else 0). | Floor | 0, 3, or 6 | Floor from listing.floor or detectFloor (regex). ≥8 → +6; ≥4 → +3; else 0. | Outdoor space | 0, 4, or 8 | private → +8; shared → +4; from listing.outdoorSpace or detectOutdoorSpace on text. |

Penalties

Penalty | Effect | Rule | |---------|--------|------| Stale listing | −10 (after clamp) | daysOnMarket ≥ 30 or, if DOM missing, pubDate older than 30 days. Sets listing.stale and a warning string. | Single photo | −5 | If photoCount ≤ 1. |

Sort order after scoring

results.sort: primary score.total descending; tie-break score.breakdown.freshness descending.

Derived percentages (informational on each listing)

These do not add to score.total; they are computed for UI / digest context.

`mustHavesPct`

Four checks (or five when roommate availableBy is used in roommate path only — see Roommate):

1. Price: price === null || price <= maxRent 2. Bedrooms: minBedrooms ≤ bedrooms ≤ maxBedrooms 3. Laundry: laundry !== 'none' 4. Availability: if profile.availableBy is set, parse move-in from text (parseAvailability); must fall in [target − 30d, target + 30d] or parsed date is null (treated as pass). If availableBy is unset, this check is true.

Formula: round(100 * (count of true) / 4).

`niceToHavesPct`

Four checks:

1. laundry === 'in-unit' 2. hasElevator 3. price <= maxRent * 0.8 4. hood.quality >= 16

Formula: round(100 * (count of true) / 4).

`commutePct`

If transitMinutes is set (Google Directions):

≤20 → 100%; ≤30 → 85%; ≤40 → 70%; ≤50 → 55%; else 40%.

Else: round((hood.commute / 20) * 100) using static neighborhood commute score.

Move-in cost estimate

When price exists:

Security deposit = one month rent (assumed).

Broker fee = 0 if noFee === true, one month if noFee === false, else 0 (unknown treated as no extra fee in the sum).

moveInCostEstimate = first month + security + broker component.

Roommate mode (`scoreListingsWithRoommate`)

If profile.json includes a top-level roommate object:

1. Listings are scored with scoreListings using the primary profile (same hard filters). 2. checkRoommateCompat runs for each surviving row against roommate’s maxRent, bedroom range, laundry rule, and optionally roommate.availableBy (±30 day window, same as above).

roommateScore.mustHavesPct: round(100 * passed / totalChecks) with 3 or 4 checks depending on roommate availableBy.

roommateScore.passes: all of price, beds, laundry true, and if availableBy set, availability check must pass too.

compat: 'both' if passes, else 'paula-only' (legacy internal label; UI may show generic “you / roommate” copy).

Neighborhood median price (`computeMedianPriceContext`)

For the current array of listings passed in (after availability filtering):

1. Group price values by listing.neighborhood (non-unknown; unknown buckets as __other__ for grouping only). 2. Median per group: sort prices; odd length → middle; even → average of two middles, rounded. 3. For each priced listing with ≥2 listings in its group: set - hoodMedianPrice - vsMedianPrice = price - median (negative = below median) - vsMedianPct = round(100 * (price - median) / median)

Important: this is not a market-wide median — only over the current batch in that run (and neighborhood string as assigned by the scorer).

Availability parsing and filtering

`parseAvailability(text)` (`pipeline/availability.js`)

Returns a Date or null, using case-insensitive patterns such as:

“available immediately / now / right away”, “move in now” → today

“available MM/DD…”

“available {Month} {day}” (year defaults to current calendar year)

“available {Month} {YYYY}”

`filterByAvailability(listings, targetDateStr, windowDays = 30)`

If targetDateStr is missing → no filtering.

Otherwise keep a listing if:

Parsed availability is null (unknown → keep), or

Parsed date ∈ [target − 30 days, target + 30 days] (inclusive).

This runs after scoring; it can remove rows that were scored.

Price history and drops (`pipeline/price-history.js`)

Store: .price-history.json, object keyed by listing id → array of { date, price } (trimmed to last 30 entries per id on save).

Record: For each listing with id + price, append an entry if the latest entry is not same price and same calendar day.

Drop detection: For “today”, look at entries with date < today. If max(previous prices) > current price, emit a drop with

dropPct = round(100 * (old - new) / old).

Enrichment-only fields (not part of `score.total`)

When Maps is enabled (pipeline/enrich.js + detail scraper), listings may gain:

transitMinutes, geocoding, review stats, rent stabilization hint (PLUTO), HPD violations / bedbug flags, housing court litigation counts, open house, broker contact, etc.

These feed dashboard filters and badges but are separate from the numeric score unless a future rule references them.

Dashboard-only behavior (`/apartments`)

Filters (chip bar)

All are client-side on the loaded listing set; they do not change results.json.

Examples: 1BR/2BR, in-unit laundry, elevator, rent-stab flags, HPD clean, no bedbugs, no fee, no litigation, photo count ≥ 5, open house, pipeline status, stale, favorites, high floor (≥4), outdoor space, has notes, etc. Exact predicates mirror page.tsx (search for filter ===).

Sort order (full)

The list is sorted in page.tsx with this precedence (all client-side):

1. Open house — Listings with openHouse set float to the top. 2. Pipeline status — Higher PIPELINE_PRIORITY first: applied (7) → offered → toured → showing → contacted → interested → none (1) → passed (0). 3. Roommate mode — If viewMode === 'roommate' and a roommate profile is active: roommateScore.mustHavesPct descending, then price ascending. 4. Chosen sort chip — Otherwise: score (score.total desc), price_asc / price_desc, fresh (pubDate desc), commute_asc (transitMinutes asc; missing → large sentinel), value_asc (vsMedianPct asc; more negative = further below median).

When roommate mode is on, the sort chip row is disabled (sortDisabled); step 3 applies instead of step 4.

Income qualifier (display only)

incomeQualifies(rent, income) in page.tsx:

No income → unknown

maxRentFor40x = income / 40 (40× rent rule)

rent ≤ maxRentFor40x → ok

rent ≤ maxRentFor40x * 1.5 → borderline

else → guarantor

Neighborhood stats panel

Aggregates currently loaded non-dismissed listings: count, average price, average score, average transit, median days on market — computed in the browser, independent of pipeline medians.

Quick reference: files

Concern | File(s) | |---------|---------| Orchestration | pipeline/index.js | Scoring & medians | pipeline/scorer.js | Static neighborhood scores | pipeline/neighborhoods.js | Seen / DOM | pipeline/store.js | Availability | pipeline/availability.js | Price history | pipeline/price-history.js | Dismissed | pipeline/dismissed.js | Dashboard filters/sorts | app/apartments/page.tsx |

Product Requirements Document: AptHunter MVP

Version: 2.0 (Post-Engineering Review) Author: Product Last Updated: March 2026 Target Launch: June 2026 (90-day development window) Status: Final — Approved for Development

Changelog from v1.0: Database migration timeline corrected (1 week → 2.5 weeks), Google Maps caching strategy explicitly defined, RentHop API dependency removed pending confirmation, auth locked to Clerk, Stripe billing scope expanded, listing expiration signal removed from digest, waitlist landing page added as day-1 pre-launch task, onboarding address field updated to use autocomplete.

1. Overview

Product Vision

AptHunter is the AI-first apartment search tool for renters in high-friction urban markets. Where incumbents (StreetEasy, Zillow) optimize for listing volume and broker relationships, AptHunter optimizes for renter outcomes — finding the best apartment for *your* criteria, faster, with information that currently requires hours of manual research.

The core promise to a user: *"Tell us what matters to you. We'll score every listing against it and tell you when to move."*

Problem Statement

A renter actively searching for an apartment in NYC spends an estimated 3–5 hours per week across manual research tasks: checking multiple listing platforms, looking up commute times, researching broker fee status, assessing rent stabilization, and tracking price changes in a personal spreadsheet. This is unstructured, error-prone work that provides no compounding advantage — every new listing restarts the process.

No current product addresses this holistically. Partial solutions exist but none combine them into a coherent renter workflow.

Success Metrics (90-day targets)

Metric | Target | |---|---| Waitlist sign-ups before launch | 200 | Registered users at launch | 500 | Paying subscribers (Pro) | 100 | MRR at end of month 3 | $1,500 | Average session length | > 8 minutes | D7 retention | > 40% | NPS | > 40 |

2. User Personas

Primary: The Active NYC Renter

Profile: 25–38, professional, relocating or lease-ending in 60–90 days. Has income to be selective but no time to waste. Searches after work, stressed by the volume of options and the speed of the market.

Jobs to be done:

Find apartments that match *my specific* criteria without manually checking 5 platforms

Know immediately whether a listing is worth my time (commute, no-fee, rent-stab status)

Get alerted when something new or something better appears

Track what I've seen and dismissed without a personal spreadsheet

Secondary: The NYC Mover with a Roommate

Profile: Same as above but coordinating with a roommate or partner with different criteria. *(Roommate collaborative mode is post-MVP but the data model should not preclude it.)*

3. MVP Feature Set

Features marked [POST-MVP] are intentionally deferred.

Feature 1: Renter Profile & Criteria Setup

Priority: P0

Users define their search criteria once via a guided 9-step onboarding flow: 1. Welcome + value prop 2. Work address input *(Google Places Autocomplete — users should not need to type exact addresses)* 3. Commute mode (subway, walking, bike, bus) 4. Max acceptable commute time 5. Budget range 6. Bedroom count 7. Move-in date range 8. Must-have amenities (laundry, elevator, dishwasher, pet-friendly, etc.) 9. Dealbreakers (optional)

At completion: user lands on their scored feed immediately. No blank state.

Saved to user's account (database-backed). Used to score listings and power the daily digest.

Feature 2: AI-Scored Listing Feed

Priority: P0

Every listing in the feed is scored across three dimensions:

Commute score (0–100): Real transit time via Google Maps Directions API from listing address to user's work address, normalized against user's stated max. Displayed as a colored badge: 🟢 18 min / 🟡 32 min / 🔴 47 min.

Must-haves score (0–100): % of user's non-negotiable criteria met (no-fee, laundry, pet policy, etc.)

Rent stabilization probability (Low / Medium / High): Derived from NYC PLUTO heuristic — pre-1974 construction, 6+ units, Class C/D/S zoning.

A composite AptScore (0–100) is calculated per listing and used for default sort.

Freemium gate: Commute score visible on first 3 listings. Full AptScore and rent stab probability require Pro.

Feature 3: Price Drop Alerts

Priority: P0

Price drop detection is powered by a price_history table that records (listing_id, price, observed_at) every pipeline run. A drop is detected when the most recent price is lower than the first recorded price for that listing.

Important cold-start note: Price history begins accumulating from the day the pipeline runs. Price drop alerts will not trigger until at least 7 days of history exist. This is expected behavior; users should be informed during onboarding ("We'll alert you if any of your saved listings drop in price — typically within 1–2 weeks as we build your price history.").

When a price drop is detected on a saved listing:

In-app notification (badge on listing)

Email notification (daily batch by default, instant opt-in)

Price history sparkline chart displayed on listing card

Pro feature only.

Feature 4: Daily Digest

Priority: P1

A daily email summarizing:

New listings matching the user's criteria added since their last session

Price drops on saved listings

~~Listings about to expire~~ *(removed — StreetEasy does not provide expiration data; we cannot reliably predict this)*

Email sent at 8:00 AM via Resend. Personalized per user profile. Includes direct links back to the app.

The digest is the primary retention driver for days when users don't open the app.

Pro feature only.

Feature 5: Listing Management

Priority: P1

Users can:

Save a listing (persisted to database, powers price drop alerts and digest)

Dismiss a listing (excluded from future feed views permanently)

Mark as contacted with optional timestamp and notes

View activity log — all listings viewed, saved, dismissed in this search cycle

Feature 6: Waitlist Landing Page

Priority: P0 — Ship on day 1 of development, not week 11.

A single-page marketing site collecting email addresses before the app launches. This page runs independently of app development and feeds the launch-day waitlist.

Contents:

One-sentence value prop

Three feature highlights (commute score, rent stab, daily digest)

Email capture form

"NYC only — more cities coming" note

Built with Next.js (or a static page on Vercel). No design overhaul required — functional is enough.

Feature 7: Subscription & Payments (Stripe Billing)

Priority: P0

Implementation note: This is Stripe Billing (subscriptions product), not a one-time Stripe Payment. The implementation scope includes:

Stripe Billing subscription creation (monthly, $15/month, cancel anytime)

Stripe webhook handling: customer.subscription.created, customer.subscription.deleted, customer.subscription.updated, invoice.payment_failed

User entitlement system: a subscription_status field on the users table, kept in sync via webhooks in real time. App reads from DB, not from Stripe on every request.

Grace period: 3-day grace period on failed payment before downgrading to free tier

Idempotency on webhook processing (Stripe may deliver webhooks multiple times)

Self-serve cancellation flow in account settings

Estimated scope: 4–5 days. Do not underscope this.

Tiers:

| Free | Pro ($15/month) | |---|---|---| Browse listings | ✅ | ✅ | Basic filters | ✅ | ✅ | Commute score | 3 listings | Unlimited | AptScore (composite) | ❌ | ✅ | Rent stab probability | ❌ | ✅ | Price drop alerts | ❌ | ✅ | Daily digest | ❌ | ✅ | Saved listings | 5 | Unlimited | Dismiss listings | ❌ | ✅ | Contact log | ❌ | ✅ |

4. Technical Architecture

Authentication

Decision: Clerk (locked)

Clerk is the correct choice for this team and timeline. Rationale:

Managed auth-as-a-service: sign-in, sign-up, email verification, session management handled

~1 day integration vs ~1.5 weeks for NextAuth done properly

Pre-built Next.js middleware and React components

Cost: $25/month at our scale — acceptable

Database Schema (Key Tables)


users
  id, clerk_user_id, email, subscription_status, stripe_customer_id, created_at
profiles
  id, user_id (FK), work_address, work_lat, work_lng, commute_mode, max_commute_min,
  budget_min, budget_max, bedrooms, move_in_start, move_in_end, amenities (jsonb),
  dealbreakers (jsonb), created_at, updated_at
listings
  id, external_id, source (streeteasy|renthop), address, lat, lng, price, bedrooms,
  bathrooms, sqft, no_fee, amenities (jsonb), url, first_seen_at, last_seen_at, active
price_history
  id, listing_id (FK), price, observed_at

user_listings id, user_id (FK), listing_id (FK), status (saved|dismissed|contacted), commute_minutes (cached), aptscore (cached), notes, updated_at`

`Google Maps Caching Strategy`


Decision: On-demand compute with (address, work_address, mode) tuple cache
We do NOT pre-compute commute times for all listings at pipeline ingestion. That approach costs $2,500+/day at scale (see engineering review). Instead:

1. When a user views a listing, check user_listings for a cached commute_minutesvalue 2. If not cached: call Google Maps Directions API, store result inuser_listings.commute_minutes` 3. Cache is scoped per (user, listing) — because different users have different work addresses 4. Cache TTL: 7 days (commute time for a given address pair rarely changes)

Estimated cost at 500 active users: ~$150–300/month. Acceptable.

Data Pipeline

Listing Ingestion:

Primary source: StreetEasy web scraper (residential IP, rate-limited to 1 req/3s)

Fallback: StreetEasy scraper via Browserless.io (managed headless browser, avoids IP blocking) — evaluate cost/benefit at first blocking event

RentHop as secondary source: pending confirmation of API access terms. Do not treat as confirmed fallback until contracts/terms reviewed.

Refresh cadence: every 4 hours

Enrichment (per new listing, at ingestion time — not per user):

NYC Planning Labs GeoSearch → BBL

NYC MapPLUTO → rent stab heuristic

Google Places API v1 → building reviews + Gemini summary

Note: Commute time is NOT computed at ingestion. It is computed on-demand per user, per listing viewed (see caching strategy above).

Stack

Frontend: Next.js

Backend: Next.js API routes + Node.js pipeline scripts

Database: PostgreSQL (Supabase or Railway — managed Postgres)

Auth: Clerk

Payments: Stripe Billing

Hosting: Vercel (frontend), Railway (pipeline cron)

Email: Resend

Scraping resilience: Browserless.io (standby, activate on first Cloudflare block)

5. User Stories (MVP)

ID | As a... | I want to... | So that... | Priority | |---|---|---|---|---| US-01 | New user | Set up my search profile in under 5 min with address autocomplete | I see relevant results immediately | P0 | US-02 | Active renter | See commute time for listings without opening Google Maps | I don't waste time on bad commutes | P0 | US-03 | Active renter | See rent stabilization probability on listings | I can prioritize long-term affordability | P0 | US-04 | Active renter | Get alerted when a saved listing drops in price | I don't miss a deal | P0 | US-05 | New visitor | Leave my email on a waitlist | I get notified when the product launches | P0 | US-06 | Active renter | Receive a daily digest of new matching listings | I stay informed without checking constantly | P1 | US-07 | Active renter | Dismiss listings I've rejected | My feed stays clean | P1 | US-08 | Active renter | Log when I've contacted a landlord | I don't double-contact | P1 | US-09 | Pro subscriber | Sort by composite AptScore | I see my best options first | P1 |

6. Out of Scope (MVP)

Explicitly deferred:

iOS / Android native apps

Multi-city support

Roommate collaborative mode

HPD violations data

In-app messaging

Landlord-facing tools

Social features

7. Revised Timeline

*Adjusted per engineering review. Database migration and Stripe billing scoped correctly.*

Week | Milestone | |---|---| Pre-development (now) | Launch waitlist landing page (1-day build) | 1–3 | Database schema + migration, Clerk auth integration, Stripe Billing + webhooks | 4–5 | User onboarding flow (9 steps, Places Autocomplete), profile save/load | 6–7 | Multi-user listing feed with per-user AptScore; on-demand commute scoring with cache | 8 | Price history accumulation + price drop detection; email notifications (Resend) | 9–10 | Daily digest email; listing save/dismiss/contact log | 11 | Freemium paywall, Stripe checkout UI, account management | 12 | QA, load testing, soft launch to waitlist |

Target: soft launch end of week 12 (late June 2026).

8. Open Questions (Resolved)

Question | Resolution | |---|---| Auth: NextAuth vs Clerk | Clerk (locked) | Maps: pre-compute vs on-demand | On-demand with tuple cache | RentHop API | Unconfirmed. Not a stated fallback until verified. | Listing expiration signal | Removed — not computable from available data | Waitlist timing | Day 1 of development | Onboarding work address input | Google Places Autocomplete |

*PRD v2.0 — Approved. Engineering may proceed.*

Engineering Review: AptHunter PRD v1.0

Reviewer: Senior Engineer Date: March 2026 Status: Feedback Round 1 — Requires Revision Before Implementation

Hey, I read through both docs. Concept is solid and I want to build it. But there are several things in the PRD that are either underspecified, optimistically scoped, or will cause real pain in implementation. Writing this up formally because I don't want to discover these things in week 7.

Critical Issues (Blockers)

1. "1 week for database migration" is not realistic

The PRD says: *"Multi-user architecture: requires database migration before public launch. Estimated: 1 week."*

This is not a migration. This is a rewrite of the data layer. Right now everything is keyed on flat JSON files per session. To support multiple users you need:

A users table

A profiles table (one per user, or many-to-one for roommate mode)

A listings table (shared, enriched data)

A user_listings table (per-user save/dismiss/contact status)

A price_history table (per listing, time-series)

A sessions/auth table

Then you need to rewrite every API route that currently reads from profile.json or results.json to query Postgres instead. And the pipeline that writes to results.json needs to write to the DB instead, scoped per user profile.

This is probably 2.5–3 weeks of work, not 1. If we compress it to 1 week we will have data model problems we're fixing for the rest of the project. Please revise the timeline.

2. The Google Maps cost math is probably wrong at scale

The PRD calculates: *"500 users × 20 listings/session × 4 sessions/month = 40,000 requests/month = $200/month."*

This assumes we only call the Directions API when a user is in a session actively viewing listings. But the PRD also says we should "pre-compute for popular listings at pipeline ingestion."

If we pre-compute at ingestion:

NYC has approximately 3,000–5,000 active rental listings on StreetEasy at any time

We refresh every 4 hours = 6 refreshes/day

Even if only 20% of listings are new per refresh: 1,000 new listings/day

Each listing needs a Directions API call PER USER PROFILE (because commute is origin-dependent)

At 500 users with distinct work addresses: 500 × 1,000 = 500,000 calls/day

That is $2,500/day, not $200/month. Clearly we cannot pre-compute per-user.

The correct approach: compute commute time on-demand when a user views a listing, cache the result by (listing_address, work_address, transit_mode) tuple. The cache key is user-agnostic if two users share a work address. Estimated real cost at 500 active users: ~$150–300/month. But this needs to be explicitly architectured, not left as an open question.

3. "RentHop API has official public endpoints" — does it actually?

The PRD states this as fact. I can't find documented public API access for RentHop. Their API appears to be available only through partnership agreements, not as a self-serve developer API. This matters because it's listed as our primary fallback if StreetEasy scraping breaks.

Before we treat this as a fallback, someone needs to actually confirm: (a) RentHop has a public or partner API, (b) what the terms are, (c) what it costs. If this doesn't exist, we're more exposed on data fragility than the PRD acknowledges. Please remove this as a stated fallback until confirmed.

4. Price history requires storage from day one — this affects the timeline

The PRD lists price drop alerts as P0 but the database schema work is described as week 1–2. For price drop detection to work, we need:

A price_history table that records (listing_id, price, observed_at) every time the pipeline runs

At least 1–2 weeks of historical data before we can detect *any* drops

This means the feature is only testable starting ~2 weeks after the DB is running

This isn't a blocker per se, but the timeline doesn't account for the cold-start period on price history. Users who sign up in week 12 will not see price drop functionality working correctly until week 14. This should be communicated in the launch plan.

Significant Issues (Need Answers Before Build)

5. Authentication: NextAuth vs Clerk — we need to pick one now

The PRD says "NextAuth.js or Clerk." This is not a trivial choice:

NextAuth: Open source, self-hosted, more configuration work, requires us to handle session storage, email verification flows, etc. Probably 1.5 weeks of setup to do properly.

Clerk: Managed auth-as-a-service, ~1 day of integration, $25/month at our scale. Pre-built UI components for sign-in/sign-up.

Given the team size and timeline, Clerk is the right call. But this decision needs to be made in week 1, not discovered as a debate in week 2. Recommend locking Clerk in the PRD and removing ambiguity.

6. Stripe: Subscriptions vs one-time payments — meaningful difference

"Stripe integration skeleton" is vague. For a monthly subscription model we need:

Stripe Billing (subscriptions product, not payments)

Webhook handling for subscription created, updated, cancelled, payment failed

User entitlement system: how does our app know a user is on Pro vs Free in real time?

Grace period handling for failed payments

This is probably 4–5 days of work to do properly, not a skeleton. The webhook handling alone has meaningful edge cases (what happens if a webhook is delayed? what if we process it twice?). Underspecifying this creates billing bugs that are very hard to debug in production.

7. The "4 AI-augmented devs" estimation problem

The timeline estimates work for a 4-person team. Two of those developers are "primarily AI-assisted." I have no problem with this in principle — AI tools genuinely accelerate development for well-specified tasks. But I want to flag:

AI-assisted development is fastest on greenfield code with clear specs. This PRD has several underspecified areas (see above). Unclear specs fed to AI tools produce plausible-looking code with subtle bugs.

The DB migration and Stripe webhook work specifically need a human reviewing the output carefully. These are the two places where "it looks like it works" bugs have the most consequences (data loss, billing errors).

Not a blocker — just a request to make sure the AI-assisted devs are working from tighter specs on the critical paths.

Minor Issues

8. "Soft launch to waitlist" — where does the waitlist come from?

The timeline ends with "soft launch to waitlist" in week 12. There is no mention of waitlist collection anywhere in the product spec. If we want 500 users at launch, we need to be collecting emails starting now, not in week 11. This should be a marketing task that starts in parallel with development — a simple landing page with email capture is a 1-day build.

9. The onboarding flow assumes users know their exact work address

Step 2 of onboarding: *"Work address input."* Most users know their office building or neighborhood but not the exact street address. Consider: autocomplete (Google Places Autocomplete API), and making this field optional at onboarding with a prompt to add later. Blocking activation on an exact address input will hurt conversion.

10. "Listings about to expire from the market" in daily digest — how do we know when a listing will expire?

The daily digest spec includes: *"Listings about to expire from the market (high urgency signal)."* StreetEasy doesn't provide listing expiration dates. We can approximate staleness by detecting when a listing disappears from our scrape, but by definition we only know a listing is gone *after* it's gone. We cannot predict expiration. Remove this feature from the digest spec or reframe it as "listings that have been active for 30+ days with recent price drops" (which we can calculate).

Summary

The PRD is directionally correct and I support the build. But before we start week 1, I need:

1. Timeline revised: DB migration = 2.5 weeks, Stripe = 5 days, push overall timeline accordingly 2. RentHop API confirmed or removed as a stated fallback 3. Google Maps caching strategy explicitly specified (on-demand with tuple cache, per my note above) 4. Auth locked to Clerk 5. Listing expiration signal removed from digest spec 6. Waitlist landing page added as a pre-launch task (day 1, not week 11)

Happy to talk through any of this. I want to ship this — just want to ship it without the landmines.

*Engineering Review v1 — Respond with revised PRD or async comments*

AptHunter — Proof of Concept Overview

Status: Working PoC · NYC · Single-user Last Updated: March 2026

What It Is

AptHunter is a working, end-to-end apartment intelligence pipeline. It scrapes live NYC listings from StreetEasy, enriches each one with real data from Google Maps and NYC public databases, scores them against a saved renter profile, and surfaces the top matches via a Telegram daily digest and a local web dashboard (Mission Control).

This is a functioning proof of concept — not a prototype or mockup. It runs on a live cron schedule, pulls real listings, computes real commute times, and detects rent stabilization using real city data.

How It Works — End to End

``StreetEasy (scraped) │ ▼ [ scraper.js ] Playwright + stealth plugin Raw listings: address, price, bedrooms, title, description, neighborhood, listing URL broker fee status │ ▼ [ scorer.js ] Must-have filter (price, beds, Hard filter laundry, full bath) │ ▼ [ enrich.js ] Google Maps API: Geocoding + - Geocode address → lat/lng Transit time - Directions API → transit time Building reviews - Places API v1 → reviews + Gemini AI summary │ ▼ [ rent-stab.js ] Free NYC public APIs: Rent stabilization - Planning Labs GeoSearch → BBL detection - MapPLUTO Open Data → year built, unit count, building class → Heuristic: pre-1974 + 6+ units + multi-family class = likely RS │ ▼ [ detail-scraper.js ] Second Playwright pass: Detail enrichment - Open house dates - Broker contact info │ ▼ [ scorer.js ] Composite AptScore (0–100): Scoring - Must-haves % met - Commute score - Price value vs. budget - Freshness - Neighborhood score │ ▼ [ price-history.js ] Price drop detection: Price tracking Records price per listing per run. Detects and flags drops with $ amount and % change. │ ▼ [ notify.js ] Telegram message with: Telegram digest - Top 5 listings scored + ranked - Address, price, bedrooms - Commute time (Google Maps) - Laundry, elevator status - Rent stab probability badge - Price drop alerts - Direct link to listing │ ▼ [ results.json ] Consumed by Mission Control Dashboard output web UI at localhost:3000/apartments`

`Data Sources`

Source | What It Provides | Cost | |---|---|---| StreetEasy | Listing data (address, price, bedrooms, fee status, description, photos) | Free (scraped via Playwright + stealth) | Google Maps Directions API | Real transit time from listing → work address | ~$0.005/request | Google Maps Geocoding API | Lat/lng coordinates from address | ~$0.005/request | Google Places API v1 | Building name, reviews, Gemini AI summary | ~$0.017/request | NYC Planning Labs GeoSearch | Address → BBL (lot identifier) | Free, no key | NYC MapPLUTO Open Data | BBL → year built, unit count, zoning class | Free, no key |

`Key Files`

File | Role | |---|---|pipeline/index.js| Orchestrator — runs the full pipeline, writes results |scraper.js| StreetEasy scraper (Playwright + stealth plugin) |scorer.js| Must-have filtering + composite AptScore calculation |enrich.js| Google Maps enrichment (geocode, transit, reviews) |rent-stab.js| Rent stabilization detection via PLUTO heuristic |detail-scraper.js| Second-pass scraper for open house + broker contact |price-history.js| Price tracking + drop detection across runs |notify.js| Telegram digest formatter + sender |neighborhoods.js| Neighborhood scoring (transit access, walkability) |availability.js| Move-in date filter |dismissed.js| Tracks user-dismissed listings (excluded from future runs) |store.js| Tracks seen listings (prevents duplicate alerts) |profile.json| Renter criteria: budget, bedrooms, commute, preferences |results.json| Output — consumed by Mission Control dashboard |run-daily.sh | Shell script run by LaunchAgent cron at 8AM daily |

`Renter Profile`

The system is profile-driven. Paula's current active profile:

`json { "name": "dream", "label": "Dream Apt", "maxRent": 5000, "minBedrooms": 2, "maxBedrooms": 2, "requirements": { "fullBath": true, "laundry": "in-building" }, "commute": { "destination": "Flatiron", "goodLines": ["F", "M", "N", "R", "1", "2", "3", "6"], "maxWalkMinutes": 10 } }`

A second "backup" profile (1BR, $3,800 max) is also defined. A roommate mode is active — each listing is scored against both Paula's criteria and Sam's, with compatibility badges (both / paula-only).

`Scoring Logic (AptScore 0–100)`

Component | Weight | Signal | |---|---|---| Must-haves | Hard filter | Laundry, full bath, price ceiling — fails = excluded | Commute | 0–35 pts | Real Google Maps transit time vs. max acceptable | Price value | 0–15 pts | How far under budget (more headroom = higher score) | Freshness | 0–10 pts | How recently listed (< 2 hours = max) | Neighborhood | 0–15 pts | Transit access, walkability heuristic | Nice-to-haves | 0–10 pts | Elevator, in-unit laundry, dishwasher | Roommate compat | Modifier | Adjusts ranking if searching with roommate |

`Rent Stabilization Detection`

Uses two free NYC public APIs, no key required:

1. NYC Planning Labs GeoSearch → resolves address to a BBL (Borough Block Lot identifier) 2. NYC MapPLUTO → queries BBL for: year built, number of units, building class

Heuristic:

🔒 Likely RS: pre-1974 construction + 6+ units + multi-family rental class (C/D/S)


⚡ Check 421-a: post-1974 + large building (may have tax exemption requiring RS)
❓ Unknown: condo, co-op, single-family, or insufficient data
Caveat: DHCR registration is voluntary and incomplete. This heuristic is as good as public data allows. Ground truth requires a formal HCR rent history request per unit.
How It Runs

Daily cron via macOS LaunchAgent:`08:00 AM → run-daily.sh → node pipeline/index.js --all → StreetEasy scrape → Google Maps enrichment → Rent stab check → Score + rank → Telegram digest to Paula → results.json → Mission Control`

Manual runs:`bash node pipeline/index.js --all # Full run, all listings node pipeline/index.js --dummy # Use test data, no API calls node pipeline/index.js --dry-run # Run everything, skip Telegram node pipeline/index.js --profile backup # Use backup profile``

What's Built vs. What's Next

Working now:

✅ StreetEasy scraper (Playwright stealth, residential IP)

✅ Google Maps enrichment (geocode + transit + Places reviews)

✅ Rent stabilization detection (PLUTO heuristic)

✅ Composite scoring with roommate mode

✅ Price drop detection and alerting

✅ Daily Telegram digest (top 5 listings)

✅ Mission Control web dashboard (localhost:3000/apartments)

✅ Dismiss listings, saved profiles, move-in date filter

✅ Onboarding flow at /onboarding

Not yet built (roadmap):

⬜ User accounts / auth (currently single-user)

⬜ Stripe subscription / freemium paywall

⬜ Email digest

⬜ Public URL / deployment

⬜ Multi-city support

Tech Stack

Layer | Technology | |---|---| Scraping | Playwright + puppeteer-extra-plugin-stealth | Backend pipeline | Node.js (ES modules) | Frontend dashboard | Next.js 14 (App Router) | Styling | Tailwind CSS (pixel-art dark theme) | Data storage | JSON files (results.json, profile.json, etc.) | Notifications | Telegram Bot API | Maps / Places | Google Maps Platform APIs | City data | NYC Open Data (MapPLUTO, GeoSearch) | Scheduler | macOS LaunchAgent (plist) |

*Screenshots and demo video available in the project docs folder.*

Computations and filtering

Pipeline flow (order of operations)

Scraper-level constraints (before scoring)

Hard exclusions in scoreListings

Total score (score.total)

Additive components

Penalties

Sort order after scoring

Derived percentages (informational on each listing)

mustHavesPct

niceToHavesPct

commutePct

Move-in cost estimate

Roommate mode (scoreListingsWithRoommate)

Neighborhood median price (computeMedianPriceContext)

Availability parsing and filtering

parseAvailability(text) (pipeline/availability.js)

filterByAvailability(listings, targetDateStr, windowDays = 30)

Price history and drops (pipeline/price-history.js)

Enrichment-only fields (not part of score.total)

Dashboard-only behavior (/apartments)

Filters (chip bar)

Sort order (full)

Income qualifier (display only)

Neighborhood stats panel

Quick reference: files

Product Requirements Document: AptHunter MVP

1. Overview

Product Vision

Problem Statement

Success Metrics (90-day targets)

2. User Personas

Primary: The Active NYC Renter

Secondary: The NYC Mover with a Roommate

3. MVP Feature Set

Feature 1: Renter Profile & Criteria Setup

Feature 2: AI-Scored Listing Feed

Feature 3: Price Drop Alerts

Feature 4: Daily Digest

Feature 5: Listing Management

Feature 6: Waitlist Landing Page

Feature 7: Subscription & Payments (Stripe Billing)

4. Technical Architecture

Authentication

Database Schema (Key Tables)

Google Maps Caching Strategy

Data Pipeline

Stack

5. User Stories (MVP)

6. Out of Scope (MVP)

7. Revised Timeline

8. Open Questions (Resolved)

Engineering Review: AptHunter PRD v1.0

Critical Issues (Blockers)

1. "1 week for database migration" is not realistic

2. The Google Maps cost math is probably wrong at scale

3. "RentHop API has official public endpoints" — does it actually?

4. Price history requires storage from day one — this affects the timeline

Significant Issues (Need Answers Before Build)

5. Authentication: NextAuth vs Clerk — we need to pick one now

6. Stripe: Subscriptions vs one-time payments — meaningful difference

7. The "4 AI-augmented devs" estimation problem

Minor Issues

8. "Soft launch to waitlist" — where does the waitlist come from?

9. The onboarding flow assumes users know their exact work address

10. "Listings about to expire from the market" in daily digest — how do we know when a listing will expire?

Summary

AptHunter — Proof of Concept Overview

What It Is

How It Works — End to End

Data Sources

Key Files

Renter Profile

Scoring Logic (AptScore 0–100)

Rent Stabilization Detection

How It Runs

What's Built vs. What's Next

Tech Stack

Hard exclusions in `scoreListings`

Total score (`score.total`)

`mustHavesPct`

`niceToHavesPct`

`commutePct`

Roommate mode (`scoreListingsWithRoommate`)

Neighborhood median price (`computeMedianPriceContext`)

`parseAvailability(text)` (`pipeline/availability.js`)

`filterByAvailability(listings, targetDateStr, windowDays = 30)`

Price history and drops (`pipeline/price-history.js`)

Enrichment-only fields (not part of `score.total`)

Dashboard-only behavior (`/apartments`)

`Google Maps Caching Strategy`

`Data Sources`

`Key Files`

`Renter Profile`

`Scoring Logic (AptScore 0–100)`

`Rent Stabilization Detection`