results.json. Then refresh /dashboard.Computations and filtering
This document describes what the Apt Hunter pipeline and dashboard compute, and how each value is derived. Implementation lives mainly under pipeline/ (Node) and app/apartments/page.tsx (dashboard).
Pipeline flow (order of operations)
When you run node pipeline/index.js (or npm run scrape:dry), the orchestrator in pipeline/index.js roughly does:
1. Fetch β StreetEasy (pipeline/scraper.js) and Zumper (pipeline/zumper-scraper.js) in parallel; dedupe by listing id.
2. Seen filter β Unless --all, keep only listings whose id is not in .seen-listings.json (filterNew). After a successful nonβdry-run, processed listings are marked seen (markSeen).
3. Enrich β If GOOGLE_MAPS_API_KEY is set (pipeline/enrich.js): geocode, transit minutes to commute destination(s), Places reviews/summary. If the key is missing, listings pass through unchanged (log warns).
4. Score β scoreListings or scoreListingsWithRoommate (pipeline/scorer.js).
5. Dismissed β Remove IDs in .dismissed.json (filterDismissed).
6. Availability filter β filterByAvailability (pipeline/availability.js) using profile.availableBy and a Β±30 day window (see below).
7. Price history β Append todayβs price per id to .price-history.json; price drops compare current price to the max price recorded on any prior calendar day (detectPriceDrops).
8. Neighborhood median context β computeMedianPriceContext adds per-listing fields vs the median rent in the current batch for that neighborhood string.
9. Detail scrape β Optional open house / broker fields (pipeline/detail-scraper.js), unless --dummy.
10. Write results.json β Top listings for the UI (see profile.delivery.topN; the file also caps stored rows for the dashboard).
--rescore re-reads existing results.json listings, re-stamps daysOnMarket from the seen store, and re-runs scoring + availability + median without scraping.
Scraper-level constraints (before scoring)
price:-{maxRent} and beds:1,2 (fixed 1β2 BR in code today), sort_by=listed_desc. |max_price, min_beds / max_beds from profile.minBedrooms / profile.maxBedrooms, and no_fee=true. |So bedroom coverage can differ by source: Zumper follows your profile; StreetEasy is always 1β2 BR in the current implementation.
Hard exclusions in scoreListings
These conditions drop the listing from scored results (no row in the pipeline output for that run):
price > profile.maxRent. |requirements.laundry === 'in-building' and detected laundry is none (no in-unit / in-building signal in title, description, tags, or scraper field). |Not hard-filtered in scoreListings (but affect must-have % and roommate checks):
- Bedroom count β Must match
minBedroomsβmaxBedroomsfor a full must-have set; wrong count still gets a total score today.
requirements.fullBathβ Present inprofile.json/ TypeScript types but not referenced inscorer.js; no pipeline enforcement.
- Move-in date β Handled in must-have % inside scoring and again in
filterByAvailabilityafter scoring (see Availability).
Total score (score.total)
The total is the sum of additive components minus penalties (clamped so the total does not go below 0 after a penalty). There is no fixed βout of 100β cap in code; logged β/100β in the CLI is informal.
Additive components
in-unit β 25; in-building β 15; unknown β 8; none β 0 (but none + in-building requirement already excluded). Uses scraper listing.laundry if set and not unknown, else regex on full text (detectLaundry). |preferences.elevator is true and elevator detected (listing.elevator or detectElevator on text). |pipeline/neighborhoods.js: first alias match on neighborhood + text β hood.commute (authoring: βease to Midtown/Flatiron-style commuteβ, not live routing). |hood.quality. Unknown neighborhood β commute 8, quality 8. |priceValueScore(price, maxRent): ratio \(r = price / maxRent\). r β€ 0.6 β 15; β€ 0.7 β 12; β€ 0.8 β 9; β€ 0.9 β 6; else 3. Missing price β 5. |daysOnMarket is set: 0 days β 10; 1 β 7; β€3 β 5; β€7 β 2; else 0. Else fallback from pubDate age in hours (under 2hβ10, under 6hβ8, under 12hβ6, under 24hβ4, under 48hβ2, else 0). |listing.floor or detectFloor (regex). β₯8 β +6; β₯4 β +3; else 0. |private β +8; shared β +4; from listing.outdoorSpace or detectOutdoorSpace on text. |Penalties
daysOnMarket β₯ 30 or, if DOM missing, pubDate older than 30 days. Sets listing.stale and a warning string. |photoCount β€ 1. |Sort order after scoring
results.sort: primary score.total descending; tie-break score.breakdown.freshness descending.
Derived percentages (informational on each listing)
These do not add to score.total; they are computed for UI / digest context.
mustHavesPct
Four checks (or five when roommate availableBy is used in roommate path only β see Roommate):
1. Price: price === null || price <= maxRent
2. Bedrooms: minBedrooms β€ bedrooms β€ maxBedrooms
3. Laundry: laundry !== 'none'
4. Availability: if profile.availableBy is set, parse move-in from text (parseAvailability); must fall in [target β 30d, target + 30d] or parsed date is null (treated as pass). If availableBy is unset, this check is true.
Formula: round(100 * (count of true) / 4).
niceToHavesPct
Four checks:
1. laundry === 'in-unit'
2. hasElevator
3. price <= maxRent * 0.8
4. hood.quality >= 16
Formula: round(100 * (count of true) / 4).
commutePct
- If
transitMinutesis set (Google Directions):
- Else:
round((hood.commute / 20) * 100)using static neighborhood commute score.
Move-in cost estimate
When price exists:
- Security deposit = one month rent (assumed).
- Broker fee =
0ifnoFee === true, one month ifnoFee === false, else0(unknown treated as no extra fee in the sum).
moveInCostEstimate= first month + security + broker component.
Roommate mode (scoreListingsWithRoommate)
If profile.json includes a top-level roommate object:
1. Listings are scored with scoreListings using the primary profile (same hard filters).
2. checkRoommateCompat runs for each surviving row against roommateβs maxRent, bedroom range, laundry rule, and optionally roommate.availableBy (Β±30 day window, same as above).
roommateScore.mustHavesPct: round(100 * passed / totalChecks) with 3 or 4 checks depending on roommate availableBy.
roommateScore.passes: all of price, beds, laundry true, and if availableBy set, availability check must pass too.
compat: 'both' if passes, else 'paula-only' (legacy internal label; UI may show generic βyou / roommateβ copy).
Neighborhood median price (computeMedianPriceContext)
For the current array of listings passed in (after availability filtering):
1. Group price values by listing.neighborhood (non-unknown; unknown buckets as __other__ for grouping only).
2. Median per group: sort prices; odd length β middle; even β average of two middles, rounded.
3. For each priced listing with β₯2 listings in its group: set
- hoodMedianPrice
- vsMedianPrice = price - median (negative = below median)
- vsMedianPct = round(100 * (price - median) / median)
Important: this is not a market-wide median β only over the current batch in that run (and neighborhood string as assigned by the scorer).
Availability parsing and filtering
parseAvailability(text) (pipeline/availability.js)
Returns a Date or null, using case-insensitive patterns such as:
- βavailable immediately / now / right awayβ, βmove in nowβ β today
- βavailable MM/DDβ¦β
- βavailable {Month} {day}β (year defaults to current calendar year)
- βavailable {Month} {YYYY}β
filterByAvailability(listings, targetDateStr, windowDays = 30)
If targetDateStr is missing β no filtering.
Otherwise keep a listing if:
- Parsed availability is
null(unknown β keep), or
- Parsed date β [target β 30 days, target + 30 days] (inclusive).
This runs after scoring; it can remove rows that were scored.
Price history and drops (pipeline/price-history.js)
- Store:
.price-history.json, object keyed by listing id β array of{ date, price }(trimmed to last 30 entries per id on save).
- Record: For each listing with id + price, append an entry if the latest entry is not same price and same calendar day.
- Drop detection: For βtodayβ, look at entries with
date < today. Ifmax(previous prices) > current price, emit a drop with
dropPct = round(100 * (old - new) / old).Enrichment-only fields (not part of score.total)
When Maps is enabled (pipeline/enrich.js + detail scraper), listings may gain:
transitMinutes, geocoding, review stats, rent stabilization hint (PLUTO), HPD violations / bedbug flags, housing court litigation counts, open house, broker contact, etc.
These feed dashboard filters and badges but are separate from the numeric score unless a future rule references them.
Dashboard-only behavior (/apartments)
Filters (chip bar)
All are client-side on the loaded listing set; they do not change results.json.
Examples: 1BR/2BR, in-unit laundry, elevator, rent-stab flags, HPD clean, no bedbugs, no fee, no litigation, photo count β₯ 5, open house, pipeline status, stale, favorites, high floor (β₯4), outdoor space, has notes, etc. Exact predicates mirror page.tsx (search for filter ===).
Sort order (full)
The list is sorted in page.tsx with this precedence (all client-side):
1. Open house β Listings with openHouse set float to the top.
2. Pipeline status β Higher PIPELINE_PRIORITY first: applied (7) β offered β toured β showing β contacted β interested β none (1) β passed (0).
3. Roommate mode β If viewMode === 'roommate' and a roommate profile is active: roommateScore.mustHavesPct descending, then price ascending.
4. Chosen sort chip β Otherwise: score (score.total desc), price_asc / price_desc, fresh (pubDate desc), commute_asc (transitMinutes asc; missing β large sentinel), value_asc (vsMedianPct asc; more negative = further below median).
When roommate mode is on, the sort chip row is disabled (sortDisabled); step 3 applies instead of step 4.
Income qualifier (display only)
incomeQualifies(rent, income) in page.tsx:
- No income β
unknown
maxRentFor40x = income / 40(40Γ rent rule)
rent β€ maxRentFor40xβok
rent β€ maxRentFor40x * 1.5βborderline
- else β
guarantor
Neighborhood stats panel
Aggregates currently loaded non-dismissed listings: count, average price, average score, average transit, median days on market β computed in the browser, independent of pipeline medians.
Quick reference: files
pipeline/index.js |pipeline/scorer.js |pipeline/neighborhoods.js |pipeline/store.js |pipeline/availability.js |pipeline/price-history.js |pipeline/dismissed.js |app/apartments/page.tsx |Product Requirements Document: AptHunter MVP
Version: 2.0 (Post-Engineering Review) Author: Product Last Updated: March 2026 Target Launch: June 2026 (90-day development window) Status: Final β Approved for DevelopmentChangelog from v1.0: Database migration timeline corrected (1 week β 2.5 weeks), Google Maps caching strategy explicitly defined, RentHop API dependency removed pending confirmation, auth locked to Clerk, Stripe billing scope expanded, listing expiration signal removed from digest, waitlist landing page added as day-1 pre-launch task, onboarding address field updated to use autocomplete.
1. Overview
Product Vision
AptHunter is the AI-first apartment search tool for renters in high-friction urban markets. Where incumbents (StreetEasy, Zillow) optimize for listing volume and broker relationships, AptHunter optimizes for renter outcomes β finding the best apartment for *your* criteria, faster, with information that currently requires hours of manual research.The core promise to a user: *"Tell us what matters to you. We'll score every listing against it and tell you when to move."*
Problem Statement
A renter actively searching for an apartment in NYC spends an estimated 3β5 hours per week across manual research tasks: checking multiple listing platforms, looking up commute times, researching broker fee status, assessing rent stabilization, and tracking price changes in a personal spreadsheet. This is unstructured, error-prone work that provides no compounding advantage β every new listing restarts the process.No current product addresses this holistically. Partial solutions exist but none combine them into a coherent renter workflow.
Success Metrics (90-day targets)
2. User Personas
Primary: The Active NYC Renter
Profile: 25β38, professional, relocating or lease-ending in 60β90 days. Has income to be selective but no time to waste. Searches after work, stressed by the volume of options and the speed of the market.Jobs to be done:
- Find apartments that match *my specific* criteria without manually checking 5 platforms
- Know immediately whether a listing is worth my time (commute, no-fee, rent-stab status)
- Get alerted when something new or something better appears
- Track what I've seen and dismissed without a personal spreadsheet
Secondary: The NYC Mover with a Roommate
Profile: Same as above but coordinating with a roommate or partner with different criteria. *(Roommate collaborative mode is post-MVP but the data model should not preclude it.)*3. MVP Feature Set
Features marked [POST-MVP] are intentionally deferred.
Feature 1: Renter Profile & Criteria Setup
Priority: P0Users define their search criteria once via a guided 9-step onboarding flow: 1. Welcome + value prop 2. Work address input *(Google Places Autocomplete β users should not need to type exact addresses)* 3. Commute mode (subway, walking, bike, bus) 4. Max acceptable commute time 5. Budget range 6. Bedroom count 7. Move-in date range 8. Must-have amenities (laundry, elevator, dishwasher, pet-friendly, etc.) 9. Dealbreakers (optional)
At completion: user lands on their scored feed immediately. No blank state.
Saved to user's account (database-backed). Used to score listings and power the daily digest.
Feature 2: AI-Scored Listing Feed
Priority: P0Every listing in the feed is scored across three dimensions:
- Commute score (0β100): Real transit time via Google Maps Directions API from listing address to user's work address, normalized against user's stated max. Displayed as a colored badge: π’ 18 min / π‘ 32 min / π΄ 47 min.
- Must-haves score (0β100): % of user's non-negotiable criteria met (no-fee, laundry, pet policy, etc.)
- Rent stabilization probability (Low / Medium / High): Derived from NYC PLUTO heuristic β pre-1974 construction, 6+ units, Class C/D/S zoning.
A composite AptScore (0β100) is calculated per listing and used for default sort.
Freemium gate: Commute score visible on first 3 listings. Full AptScore and rent stab probability require Pro.
Feature 3: Price Drop Alerts
Priority: P0Price drop detection is powered by a price_history table that records (listing_id, price, observed_at) every pipeline run. A drop is detected when the most recent price is lower than the first recorded price for that listing.
Important cold-start note: Price history begins accumulating from the day the pipeline runs. Price drop alerts will not trigger until at least 7 days of history exist. This is expected behavior; users should be informed during onboarding ("We'll alert you if any of your saved listings drop in price β typically within 1β2 weeks as we build your price history.").
When a price drop is detected on a saved listing:
- In-app notification (badge on listing)
- Email notification (daily batch by default, instant opt-in)
- Price history sparkline chart displayed on listing card
Pro feature only.
Feature 4: Daily Digest
Priority: P1A daily email summarizing:
- New listings matching the user's criteria added since their last session
- Price drops on saved listings
- ~~Listings about to expire~~ *(removed β StreetEasy does not provide expiration data; we cannot reliably predict this)*
Email sent at 8:00 AM via Resend. Personalized per user profile. Includes direct links back to the app.
The digest is the primary retention driver for days when users don't open the app.
Pro feature only.
Feature 5: Listing Management
Priority: P1Users can:
- Save a listing (persisted to database, powers price drop alerts and digest)
- Dismiss a listing (excluded from future feed views permanently)
- Mark as contacted with optional timestamp and notes
- View activity log β all listings viewed, saved, dismissed in this search cycle
Feature 6: Waitlist Landing Page
Priority: P0 β Ship on day 1 of development, not week 11.A single-page marketing site collecting email addresses before the app launches. This page runs independently of app development and feeds the launch-day waitlist.
Contents:
- One-sentence value prop
- Three feature highlights (commute score, rent stab, daily digest)
- Email capture form
- "NYC only β more cities coming" note
Built with Next.js (or a static page on Vercel). No design overhaul required β functional is enough.
Feature 7: Subscription & Payments (Stripe Billing)
Priority: P0Implementation note: This is Stripe Billing (subscriptions product), not a one-time Stripe Payment. The implementation scope includes:
- Stripe Billing subscription creation (monthly, $15/month, cancel anytime)
- Stripe webhook handling:
customer.subscription.created,customer.subscription.deleted,customer.subscription.updated,invoice.payment_failed
- User entitlement system: a
subscription_statusfield on the users table, kept in sync via webhooks in real time. App reads from DB, not from Stripe on every request.
- Grace period: 3-day grace period on failed payment before downgrading to free tier
- Idempotency on webhook processing (Stripe may deliver webhooks multiple times)
- Self-serve cancellation flow in account settings
Estimated scope: 4β5 days. Do not underscope this.
Tiers:
4. Technical Architecture
Authentication
Decision: Clerk (locked)Clerk is the correct choice for this team and timeline. Rationale:
- Managed auth-as-a-service: sign-in, sign-up, email verification, session management handled
- ~1 day integration vs ~1.5 weeks for NextAuth done properly
- Pre-built Next.js middleware and React components
- Cost: $25/month at our scale β acceptable
Database Schema (Key Tables)
``
users
id, clerk_user_id, email, subscription_status, stripe_customer_id, created_atprofiles
id, user_id (FK), work_address, work_lat, work_lng, commute_mode, max_commute_min,
budget_min, budget_max, bedrooms, move_in_start, move_in_end, amenities (jsonb),
dealbreakers (jsonb), created_at, updated_at
listings
id, external_id, source (streeteasy|renthop), address, lat, lng, price, bedrooms,
bathrooms, sqft, no_fee, amenities (jsonb), url, first_seen_at, last_seen_at, active
price_history
id, listing_id (FK), price, observed_at
user_listings
id, user_id (FK), listing_id (FK), status (saved|dismissed|contacted),
commute_minutes (cached), aptscore (cached), notes, updated_at
`Google Maps Caching Strategy
Decision: On-demand compute with (address, work_address, mode) tuple cacheWe do NOT pre-compute commute times for all listings at pipeline ingestion. That approach costs $2,500+/day at scale (see engineering review). Instead:
1. When a user views a listing, check
user_listings for a cached commute_minutes value
2. If not cached: call Google Maps Directions API, store result in user_listings.commute_minutes`
3. Cache is scoped per (user, listing) β because different users have different work addresses
4. Cache TTL: 7 days (commute time for a given address pair rarely changes)Estimated cost at 500 active users: ~$150β300/month. Acceptable.
Data Pipeline
Listing Ingestion:- Primary source: StreetEasy web scraper (residential IP, rate-limited to 1 req/3s)
- Fallback: StreetEasy scraper via Browserless.io (managed headless browser, avoids IP blocking) β evaluate cost/benefit at first blocking event
- RentHop as secondary source: pending confirmation of API access terms. Do not treat as confirmed fallback until contracts/terms reviewed.
- Refresh cadence: every 4 hours
Enrichment (per new listing, at ingestion time β not per user):
- NYC Planning Labs GeoSearch β BBL
- NYC MapPLUTO β rent stab heuristic
- Google Places API v1 β building reviews + Gemini summary
Note: Commute time is NOT computed at ingestion. It is computed on-demand per user, per listing viewed (see caching strategy above).
Stack
- Frontend: Next.js
- Backend: Next.js API routes + Node.js pipeline scripts
- Database: PostgreSQL (Supabase or Railway β managed Postgres)
- Auth: Clerk
- Payments: Stripe Billing
- Hosting: Vercel (frontend), Railway (pipeline cron)
- Email: Resend
- Scraping resilience: Browserless.io (standby, activate on first Cloudflare block)
5. User Stories (MVP)
6. Out of Scope (MVP)
Explicitly deferred:
- iOS / Android native apps
- Multi-city support
- Roommate collaborative mode
- HPD violations data
- In-app messaging
- Landlord-facing tools
- Social features
7. Revised Timeline
*Adjusted per engineering review. Database migration and Stripe billing scoped correctly.*
Target: soft launch end of week 12 (late June 2026).
8. Open Questions (Resolved)
*PRD v2.0 β Approved. Engineering may proceed.*
Engineering Review: AptHunter PRD v1.0
Reviewer: Senior Engineer Date: March 2026 Status: Feedback Round 1 β Requires Revision Before ImplementationHey, I read through both docs. Concept is solid and I want to build it. But there are several things in the PRD that are either underspecified, optimistically scoped, or will cause real pain in implementation. Writing this up formally because I don't want to discover these things in week 7.
Critical Issues (Blockers)
1. "1 week for database migration" is not realistic
The PRD says: *"Multi-user architecture: requires database migration before public launch. Estimated: 1 week."*
This is not a migration. This is a rewrite of the data layer. Right now everything is keyed on flat JSON files per session. To support multiple users you need:
- A users table
- A profiles table (one per user, or many-to-one for roommate mode)
- A listings table (shared, enriched data)
- A user_listings table (per-user save/dismiss/contact status)
- A price_history table (per listing, time-series)
- A sessions/auth table
Then you need to rewrite every API route that currently reads from profile.json or results.json to query Postgres instead. And the pipeline that writes to results.json needs to write to the DB instead, scoped per user profile.
This is probably 2.5β3 weeks of work, not 1. If we compress it to 1 week we will have data model problems we're fixing for the rest of the project. Please revise the timeline.
2. The Google Maps cost math is probably wrong at scale
The PRD calculates: *"500 users Γ 20 listings/session Γ 4 sessions/month = 40,000 requests/month = $200/month."*
This assumes we only call the Directions API when a user is in a session actively viewing listings. But the PRD also says we should "pre-compute for popular listings at pipeline ingestion."
If we pre-compute at ingestion:
- NYC has approximately 3,000β5,000 active rental listings on StreetEasy at any time
- We refresh every 4 hours = 6 refreshes/day
- Even if only 20% of listings are new per refresh: 1,000 new listings/day
- Each listing needs a Directions API call PER USER PROFILE (because commute is origin-dependent)
- At 500 users with distinct work addresses: 500 Γ 1,000 = 500,000 calls/day
That is $2,500/day, not $200/month. Clearly we cannot pre-compute per-user.
The correct approach: compute commute time on-demand when a user views a listing, cache the result by (listing_address, work_address, transit_mode) tuple. The cache key is user-agnostic if two users share a work address. Estimated real cost at 500 active users: ~$150β300/month. But this needs to be explicitly architectured, not left as an open question.
3. "RentHop API has official public endpoints" β does it actually?
The PRD states this as fact. I can't find documented public API access for RentHop. Their API appears to be available only through partnership agreements, not as a self-serve developer API. This matters because it's listed as our primary fallback if StreetEasy scraping breaks.
Before we treat this as a fallback, someone needs to actually confirm: (a) RentHop has a public or partner API, (b) what the terms are, (c) what it costs. If this doesn't exist, we're more exposed on data fragility than the PRD acknowledges. Please remove this as a stated fallback until confirmed.
4. Price history requires storage from day one β this affects the timeline
The PRD lists price drop alerts as P0 but the database schema work is described as week 1β2. For price drop detection to work, we need:
- A
price_historytable that records (listing_id, price, observed_at) every time the pipeline runs
- At least 1β2 weeks of historical data before we can detect *any* drops
- This means the feature is only testable starting ~2 weeks after the DB is running
This isn't a blocker per se, but the timeline doesn't account for the cold-start period on price history. Users who sign up in week 12 will not see price drop functionality working correctly until week 14. This should be communicated in the launch plan.
Significant Issues (Need Answers Before Build)
5. Authentication: NextAuth vs Clerk β we need to pick one now
The PRD says "NextAuth.js or Clerk." This is not a trivial choice:
- NextAuth: Open source, self-hosted, more configuration work, requires us to handle session storage, email verification flows, etc. Probably 1.5 weeks of setup to do properly.
- Clerk: Managed auth-as-a-service, ~1 day of integration, $25/month at our scale. Pre-built UI components for sign-in/sign-up.
Given the team size and timeline, Clerk is the right call. But this decision needs to be made in week 1, not discovered as a debate in week 2. Recommend locking Clerk in the PRD and removing ambiguity.
6. Stripe: Subscriptions vs one-time payments β meaningful difference
"Stripe integration skeleton" is vague. For a monthly subscription model we need:
- Stripe Billing (subscriptions product, not payments)
- Webhook handling for subscription created, updated, cancelled, payment failed
- User entitlement system: how does our app know a user is on Pro vs Free in real time?
- Grace period handling for failed payments
This is probably 4β5 days of work to do properly, not a skeleton. The webhook handling alone has meaningful edge cases (what happens if a webhook is delayed? what if we process it twice?). Underspecifying this creates billing bugs that are very hard to debug in production.
7. The "4 AI-augmented devs" estimation problem
The timeline estimates work for a 4-person team. Two of those developers are "primarily AI-assisted." I have no problem with this in principle β AI tools genuinely accelerate development for well-specified tasks. But I want to flag:
- AI-assisted development is fastest on greenfield code with clear specs. This PRD has several underspecified areas (see above). Unclear specs fed to AI tools produce plausible-looking code with subtle bugs.
- The DB migration and Stripe webhook work specifically need a human reviewing the output carefully. These are the two places where "it looks like it works" bugs have the most consequences (data loss, billing errors).
Not a blocker β just a request to make sure the AI-assisted devs are working from tighter specs on the critical paths.
Minor Issues
8. "Soft launch to waitlist" β where does the waitlist come from?
The timeline ends with "soft launch to waitlist" in week 12. There is no mention of waitlist collection anywhere in the product spec. If we want 500 users at launch, we need to be collecting emails starting now, not in week 11. This should be a marketing task that starts in parallel with development β a simple landing page with email capture is a 1-day build.
9. The onboarding flow assumes users know their exact work address
Step 2 of onboarding: *"Work address input."* Most users know their office building or neighborhood but not the exact street address. Consider: autocomplete (Google Places Autocomplete API), and making this field optional at onboarding with a prompt to add later. Blocking activation on an exact address input will hurt conversion.
10. "Listings about to expire from the market" in daily digest β how do we know when a listing will expire?
The daily digest spec includes: *"Listings about to expire from the market (high urgency signal)."* StreetEasy doesn't provide listing expiration dates. We can approximate staleness by detecting when a listing disappears from our scrape, but by definition we only know a listing is gone *after* it's gone. We cannot predict expiration. Remove this feature from the digest spec or reframe it as "listings that have been active for 30+ days with recent price drops" (which we can calculate).
Summary
The PRD is directionally correct and I support the build. But before we start week 1, I need:
1. Timeline revised: DB migration = 2.5 weeks, Stripe = 5 days, push overall timeline accordingly 2. RentHop API confirmed or removed as a stated fallback 3. Google Maps caching strategy explicitly specified (on-demand with tuple cache, per my note above) 4. Auth locked to Clerk 5. Listing expiration signal removed from digest spec 6. Waitlist landing page added as a pre-launch task (day 1, not week 11)
Happy to talk through any of this. I want to ship this β just want to ship it without the landmines.
*Engineering Review v1 β Respond with revised PRD or async comments*
AptHunter β Proof of Concept Overview
Status: Working PoC Β· NYC Β· Single-user Last Updated: March 2026What It Is
AptHunter is a working, end-to-end apartment intelligence pipeline. It scrapes live NYC listings from StreetEasy, enriches each one with real data from Google Maps and NYC public databases, scores them against a saved renter profile, and surfaces the top matches via a Telegram daily digest and a local web dashboard (Mission Control).
This is a functioning proof of concept β not a prototype or mockup. It runs on a live cron schedule, pulls real listings, computes real commute times, and detects rent stabilization using real city data.
How It Works β End to End
``
StreetEasy (scraped)
β
βΌ
[ scraper.js ] Playwright + stealth plugin
Raw listings: address, price, bedrooms,
title, description, neighborhood, listing URL
broker fee status
β
βΌ
[ scorer.js ] Must-have filter (price, beds,
Hard filter laundry, full bath)
β
βΌ
[ enrich.js ] Google Maps API:
Geocoding + - Geocode address β lat/lng
Transit time - Directions API β transit time
Building reviews - Places API v1 β reviews +
Gemini AI summary
β
βΌ
[ rent-stab.js ] Free NYC public APIs:
Rent stabilization - Planning Labs GeoSearch β BBL
detection - MapPLUTO Open Data β year built,
unit count, building class
β Heuristic: pre-1974 + 6+ units
+ multi-family class = likely RS
β
βΌ
[ detail-scraper.js ] Second Playwright pass:
Detail enrichment - Open house dates
- Broker contact info
β
βΌ
[ scorer.js ] Composite AptScore (0β100):
Scoring - Must-haves % met
- Commute score
- Price value vs. budget
- Freshness
- Neighborhood score
β
βΌ
[ price-history.js ] Price drop detection:
Price tracking Records price per listing per run.
Detects and flags drops with $ amount
and % change.
β
βΌ
[ notify.js ] Telegram message with:
Telegram digest - Top 5 listings scored + ranked
- Address, price, bedrooms
- Commute time (Google Maps)
- Laundry, elevator status
- Rent stab probability badge
- Price drop alerts
- Direct link to listing
β
βΌ
[ results.json ] Consumed by Mission Control
Dashboard output web UI at localhost:3000/apartments
`
Data Sources
Key Files
| Orchestrator β runs the full pipeline, writes results | | StreetEasy scraper (Playwright + stealth plugin) | | Must-have filtering + composite AptScore calculation | | Google Maps enrichment (geocode, transit, reviews) | | Rent stabilization detection via PLUTO heuristic | | Second-pass scraper for open house + broker contact | | Price tracking + drop detection across runs | | Telegram digest formatter + sender | | Neighborhood scoring (transit access, walkability) | | Move-in date filter | | Tracks user-dismissed listings (excluded from future runs) | | Tracks seen listings (prevents duplicate alerts) | | Renter criteria: budget, bedrooms, commute, preferences | | Output β consumed by Mission Control dashboard | | Shell script run by LaunchAgent cron at 8AM daily |Renter Profile
The system is profile-driven. Paula's current active profile:
`json
{
"name": "dream",
"label": "Dream Apt",
"maxRent": 5000,
"minBedrooms": 2,
"maxBedrooms": 2,
"requirements": {
"fullBath": true,
"laundry": "in-building"
},
"commute": {
"destination": "Flatiron",
"goodLines": ["F", "M", "N", "R", "1", "2", "3", "6"],
"maxWalkMinutes": 10
}
}
`
A second "backup" profile (1BR, $3,800 max) is also defined. A roommate mode is active β each listing is scored against both Paula's criteria and Sam's, with compatibility badges (both / paula-only).
Scoring Logic (AptScore 0β100)
Rent Stabilization Detection
Uses two free NYC public APIs, no key required:
1. NYC Planning Labs GeoSearch β resolves address to a BBL (Borough Block Lot identifier) 2. NYC MapPLUTO β queries BBL for: year built, number of units, building class
Heuristic:
- π Likely RS: pre-1974 construction + 6+ units + multi-family rental class (C/D/S)
- β‘ Check 421-a: post-1974 + large building (may have tax exemption requiring RS)
- β Unknown: condo, co-op, single-family, or insufficient data
Caveat: DHCR registration is voluntary and incomplete. This heuristic is as good as public data allows. Ground truth requires a formal HCR rent history request per unit.
How It Runs
Daily cron via macOS LaunchAgent:
`
08:00 AM β run-daily.sh β node pipeline/index.js --all
β StreetEasy scrape
β Google Maps enrichment
β Rent stab check
β Score + rank
β Telegram digest to Paula
β results.json β Mission Control
`
Manual runs:
`bash
node pipeline/index.js --all # Full run, all listings
node pipeline/index.js --dummy # Use test data, no API calls
node pipeline/index.js --dry-run # Run everything, skip Telegram
node pipeline/index.js --profile backup # Use backup profile
``
What's Built vs. What's Next
Working now:
- β StreetEasy scraper (Playwright stealth, residential IP)
- β Google Maps enrichment (geocode + transit + Places reviews)
- β Rent stabilization detection (PLUTO heuristic)
- β Composite scoring with roommate mode
- β Price drop detection and alerting
- β Daily Telegram digest (top 5 listings)
- β Mission Control web dashboard (localhost:3000/apartments)
- β Dismiss listings, saved profiles, move-in date filter
- β Onboarding flow at /onboarding
Not yet built (roadmap):
- β¬ User accounts / auth (currently single-user)
- β¬ Stripe subscription / freemium paywall
- β¬ Email digest
- β¬ Public URL / deployment
- β¬ Multi-city support
Tech Stack
*Screenshots and demo video available in the project docs folder.*