Search redesign for 1.0

From StatusNet
(Redirected from Search redesign 1.0)
Jump to: navigation, search

Contents

[edit] Desired features

Low-level

Filtering

Output

[edit] Current infrastructure

As of 0.9.x....

SearchEngine parent class and children in lib/search_engines.php...

  • you take your given class and initialize it with a target table (notice or profile)
  • you toss an opaque query into it with $engine->query()
  • that loads up something into a DB_DataObject in $engine->target
  • you go through that list outputting things

Problems:

  • can only support specific tables mentioned
  • fields and filtering are not really provided as options; the query type is fixed to the table

[edit] MySQL fulltext

  • profile
    • MATCH(nickname, fullname, location, bio, homepage) AGAINST (q IN BOOLEAN MODE)
      • and does a second with strtolower() if query isn't all lowercase [may break on utf-8]
  • notice
    • excludes (notice.is_local = Notice::GATEWAY)
    • MATCH(content) AGAINST (q IN BOOLEAN MODE)
      • also does the second with strtolower() if query isn't all lowercase

[edit] MySQL LIKE

  • profile
    • (nickname LIKE "%q%" OR
      fullname LIKE "%q%" OR
      location LIKE "%q%" OR
      bio LIKE "%q%" OR
      homepage LIKE "%q%")
  • notice
    • not excluding gatewayed notices
    • content LIKE "%q%"

[edit] PostgreSQL

  • profile
    • textsearch @@ plainto_tsquery(q)
  • notice
    • not excluding gatewayed notices
    • to_tsvector('english', content) @@ plainto_tsquery(s)

[edit] Sphinx plugin

  • profile
    • passes query through to Sphinx
    • indexing query:
      sql_query = SELECT id, UNIX_TIMESTAMP(created) as created_ts, nickname, fullname, location, bio, homepage FROM profile
      sql_attr_timestamp = created_ts
  • notice
    • passes query through to Sphinx
    • not excluding gatewayed notices
      sql_query = SELECT id, UNIX_TIMESTAMP(created) as created_ts, content FROM notice
      sql_attr_timestamp = created_ts

[edit] Notes

General

  • notices
    • filter by:
      • poster (profile_id)
      • group...? tag...?
  • simulate attributes by adding special tags into the fulltext?
    • "this is a nifty post @profile_id:1234 @group:32 @group:65"

MySQL

  • Minimum lengths and stopwords can be disabled/changed at server config level...
  • Or can be gotten around with a customized search target field -- but that's harder
  • Wildcards can be supported, at least to some degress
  • Default 'OR' search sucks horribly, probably could benefit from query rewriting
  • Doesn't work with InnoDB, which we prefer for main tables
    • if use by default, consider splitting out myisam search index tables

MySQL like

  • I'd prefer to kill this as it scales very poorly
  • Current implementation isn't escaping properly (looks like % and _ wildcards will actually go through)
  • no implementation for fancier search keywords

Postgres

  • will need to look up some docs...

Sphinx

  • sql_query specifies which fields get pulled for actual indexing
  • sql_query_info setting is used for debugging queries with the cli client only
  • certain column values can be marked as "attributes", which can be used for filtering or sorting results (but not for searching)
  • double-check the query language but I recall it being roughly sensible
Personal tools
Namespaces
Variants
Actions
Navigation
Status.net
Toolbox