I made this website the other day. searchbrew.com This article about how I made it.



Its a search for hombrew packages. Brew by itself doesnt have a very good search, because they dont want to collect description metadata about packages. Brew does have a link to the homepage of each package. I tried getting descriptions and indexing the homepage for search, but it didnt work very well. Then I came across this project telemachus/homebrew-desc which includes project descriptions. Homebrew-desc isnt always up to date, but its pretty good.

The "all" link lists all packages and their descriptions which is good for spotting errors.

Searchbrew.com gets its list of homebrew packages from the homebrew git repo, and the descriptions from homebrew-desc and puts it in elasticsearch. This updates periodically. When a user searches it just goes to elasticsearch.

The elasticsearch query does a fuzzy search on all fields. Results from that were poor, where the exact match results wouldn't be first. I thought ES would favour that. I added a exact match query, results were better but often results where the term was in the description was ranked higher than where the term was in the title. I thought ES was supposed to favour shorter fields. So I added a match query on the title. Results were good. The ES query is bellow.

query: {
  bool: {
    should: 
      {query_string: {query: q~}},
      {query_string: {query: q}},
      {match: {title: q}}
    ]
  }
}

At the moment Im happy with searchbrew, even though it data isn't perfect. There is some time between a new package getting added to homebrew and a description getting added to homebrew-desc. I could add some stuff to make it easier to add descriptions, maybe add some alerting. But for now its fine.