KhmerWebSearch.com: April product and search relevance progress

KhmerWebSearch.com has crossed the line from scattered infrastructure work into a functional, live search engine stack.

Khrawler handles acquisition. The backend pipeline turns crawled pages into search-ready documents. Typesense serves the retrieval layer. The API exposes the product endpoints. Then the web UI turns the whole thing into something people can use at khmerwebsearch.com.

That is the good news. The uncomfortable news is that being live removes the luxury of vague optimism. Now the weak spots are visible.

What improved on the product side

The product is cleaner than it was a few weeks ago.

The frontend is more search-first, simpler, and more responsive. The homepage is less noisy. The pipeline is stricter about what enters the index. Search retrieval is doing more deliberate normalization, reranking, deduplication, and field weighting. Source intent and query intent are starting to get more explicit treatment instead of being left to generic matching.

That is real progress because better search is usually about disciplined subtraction as much as addition: less junk, less accidental ranking, less interface noise, fewer excuses.

What the real bottleneck is now

The main bottleneck is still search relevance and trust.

It is not that hard to return something. The hard part is returning results that feel sensible for the query and trustworthy enough that people stop second-guessing the engine. That includes broad queries, navigational queries, official-information queries, entity queries, and ambiguous Khmer phrasing where the words alone do not reveal all the intent.

This is also where the benchmark work matters. A few nice queries mean nothing. The product gets judged on repeated behavior across a representative set of real Khmer searches.

Why this stage is harder than it looks

Search quality is spread across the whole stack.

If source coverage is incomplete, obvious targets are missing. If extraction is weak, good pages become weak documents. If timestamps are shaky, freshness gets distorted. If snippets are weak, even decent results feel cheap. If source priors are missing, navigational queries drift. If official-intent queries are treated like generic news searches, the product feels dumb.

So the work now is less about shipping another shiny feature and more about removing distortion layer by layer.

Immediate goal

The immediate goal is simple: make the results noticeably better on real Khmer queries.

That means tighter source rules, better indexing coverage where it matters, better intent handling, better document cleanliness, stronger evaluation, and more honest use of user behavior and benchmark evidence. In plain terms: the top results should feel more useful, more expected, and less accidental.

The right bar is not “good enough for a demo.” The right bar is that repeated use starts building confidence.

Main focus from here

The focus from here is pretty clear.

Keep expanding domain support carefully. Keep tightening indexing rules. Keep improving source-intent aliases and practical-intent routing. Keep measuring against representative Khmer benchmarks instead of vibes. Keep the interface fast and quiet. And keep making search results earn trust one correction at a time.

That work is slower than a flashy launch cycle, but it is the real work. Khmer Web Search becomes useful if the full loop — source to document to result — gets more disciplined over time.