
Under the hood of Assessorly — TypeScript, Bash, and the math
We don’t guess—we measure. From the first click, we use maps and math to find truly comparable sales—nearby, recent, and similar in features. Each candidate is scored on distance, recency, and property differences, so you get defensible results you can see. We’re your advocate, not a black box.
Data layer: Amazon RDS for MariaDB 11.6 (InnoDB), B-trees, and spatial indexes (R-tree)
- Engine: Amazon RDS for MariaDB 11.6 (InnoDB)
- B-tree for equality/range filters: e.g.,
(county_id),(parcel_id, sale_date),(year_built),(gla).- Point lookup:
O(log_B N) - Range lookup (k rows):
O(log_B N + k) - Space:
O(N)
- Point lookup:
- Spatial index (R-tree) on
POINTfor bounding-box prefilter.- Avg query:
O(log_M N + c)(c = candidates) - Worst-case:
O(N)
- Avg query:
- Primary keys:
BINARY(16)UUIDs - Geo:
POINT(SRID=4326) for units/parcels
- B-tree for equality/range filters: e.g.,
Comparable selection—algorithm & complexity
Staged pipeline: hard filters → spatial prefilter → precise distance → score/rank → top-k.
Scoring
\[ S(i) = w_d \cdot \frac{1}{1 + \operatorname{dist}(t,i)} + w_r \cdot e^{-\lambda \cdot \Delta\text{days}(i)} - w_g\left|\frac{GLA_i - GLA_t}{GLA_t}\right| - w_y \lvert year_i - year_t \rvert - w_a \lvert acres_i - acres_t \rvert \]Weights are learned or hand‑tuned per market.
Great-circle distance (haversine)
\[ d = 2R\,\arcsin\!\left(\sqrt{\sin^2\!\tfrac{\Delta\varphi}{2} + \cos\varphi_t\cos\varphi_i\,\sin^2\!\tfrac{\Delta\lambda}{2}}\right),\quad R \approx 6{,}371{,}000\,\text{m} \]Complexity
- Hard SQL filters (B-tree):
O(log N + k), returning k rows - Spatial prefilter (R-tree bbox):
O(log N + c)candidates - Precise distance for c:
O(c) - Top-k selection (heap):
O(c \log k) - Total:
O(\log N + c \log k)
Here N = rows, c = candidates after bbox, k = results retained, p = features, n = samples.
Fuzzy address search (Levenshtein DP)
Fast, fuzzy lookups on full_address to tolerate typos, abbreviations, and partial inputs. Uses a two‑row Levenshtein DP with optional banding.
Two‑row DP (O(min(n,m)) space)
\[ D_{i,j} = \min\{ D_{i-1,j} + 1,\; D_{i,j-1} + 1,\; D_{i-1,j-1} + [a_i \neq b_j] \}\,;\quad D_{0,j}=j,\; D_{i,0}=i \]Per string (banded): O(d·L), where d is the band width and L the string length; top‑k heap isO(U \log k) vs full sort O(U \log U), where U = candidates.
Valuation (AVM) — multiple linear regression
Interpretable MLR with ridge regularization and sanity constraints (e.g., more GLA should not reduce value).
Model
\[ y = \beta_0\,\mathbf{1} + X\,\boldsymbol{\beta} + \varepsilon \]\[ \hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} \; \lVert y - X\boldsymbol{\beta} \rVert_2^2 + \lambda\, \lVert \Gamma\boldsymbol{\beta} \rVert_2^2 \]Training complexity (QR/SVD): O(n p^2 + p^3). Prediction: O(p) per unit.
Empirical‑Bayes / fixed‑effects meta‑analysis of past reports, implemented with:
- Weighted Welford online stats to stream means/variances per feature (and intercept) globally and by region (running mean + M2 → variance = M2/weight).
- Precision‑weighted pooling to combine sources (global + region).
- Weights are the comparable count per report; 5‑minute cache on the snapshots.
This is inverse‑variance weighting (Normal–Normal conjugate update), not a random‑effects model.
ETL & geo ingest — streaming, concurrency, and costs
- Format: line‑delimited GeoJSON, TIGER/Line shapefiles, county JSON
- Pattern: staging → normalize → idempotent upsert (hash guard)
- Geometry:
ST_GeomFromWKB/WKT, SRID=4326 enforced - Concurrency: bounded workers (I/O vs DB write balance)
Observability: Grafana, Prometheus, GA4
- Metrics: Prometheus → Grafana (API p95, DB latency, queue depth, Longhorn volumes)
- Product analytics: Google Analytics (GA4)
Security & secrets: Vault
- Vault for scoped DB creds + API tokens, with rotation
- Ingress‑NGINX + ModSecurity; TLS via Let’s Encrypt or AWS ACM
- AWS SSM; least‑privilege IAM
AWS deployment — GitHub Actions
The AWS footprint (Route 53, ACM, S3, CI runners, etc.) is maintained in a public repo: MilesSystems/aws-deployment. Diagrams in that repo’s Diagrams/ folder are embedded below and auto‑refresh from GitHub.
Infrastructure: Harvester, Longhorn, R640/R510, AWS
- Harvester (KVM + RKE2) for local virtualization + Kubernetes
- Longhorn distributed block storage (snapshots, rebuilds, replicas)
- R640: compute / hot I/O (DB, API, AVM)
- R510: capacity / archive / bulk datasets
- AWS: Route 53, ACM, S3; self‑hosted GitHub Actions runners
Why this works (performance & explainability)
- Sub‑second search by keeping candidates c small and using heap top‑k
- Auditable regression with intervals + constraints
- Linear‑time streaming ETL
- Right‑sized hybrid infra (R640 hot‑path, R510 bulk; Longhorn resilience)
Quick reference (Big‑O)
- B‑tree point:
O(\log N) - B‑tree range (k rows):
O(\log N + k) - R‑tree bbox:
O(\log N + c)avg - Distance calc (c):
O(c) - Top‑k heap:
O(c \log k) - End‑to‑end comps:
O(\log N + c \log k) - MLR train (QR/SVD):
O(n p^2 + p^3) - MLR predict:
O(p) - ETL parse+write:
O(N \log N)
Forward paths
- Quantile regression for robust intervals
- Monotonic constraints across core features (GLA, lot size)
- Adaptive indexer driven by slow‑query logs
- Neighborhood embeddings to prune false comps





