This package allows users to perform full text searches of the ]project-open[ database. All searching functions are permission sensitive and security conscious of a users rights and privileges. This package is an adapter, allowing PostgreSQL 's TSearch2 module to index ]project-open[ contents and then provide the search functionality for project, users, forums, files.
]project-open[ Search is implemented using the TSearch2 full text engine that comes as a part of the PostgreSQL search engine (on Oracle, ]project-open[ supports Intermedia).
TSearch2 allows for a tight integration of full text indices and SQL statements, allowing to mix instructions for structured queries (in order to determine permissions and object relationships) and access to full text indices.
The implementation consists of the following elements:
-- The main search table with Full Text Index. -- create table im_search_objects ( object_id integer, -- may include "object types" outside of OpenACS -- that are not in the "acs_object_types" table. object_type_id integer constraint im_search_objects_object_type_id_fk references im_search_object_types on delete cascade, -- What is the topmost container for this object? -- Allows to speed up the elimination of objects -- that the current user can't access biz_object_id integer constraint im_search_objects_biz_obj_id_fk references acs_objects on delete cascade, -- Owner may not need to be a "user" (in the case -- of a deleted user). Owners can be asked to give -- permissions to a document even if the document -- is not readable for the searching user. owner_id integer constraint im_search_objects_owner_id_fk references persons on delete cascade, -- Bitset with one bit for each "profile": -- We use an integer instead of a "bit varying" -- in order to keep the set compatible with Oracle. -- A set bit indicates that object is readable to -- members of the profile independent of the -- biz_object_id permissions. profile_permissions integer, -- counter for number of accesses to this object -- either from the permission() proc or from -- reading in the server log file. popularity integer, -- Full Text Index fti tsvector, -- For tables that don't respect the OpenACS object -- scheme we may get "object_id"s that start with 0. primary key (object_id, object_type_id) ); create index im_search_objects_fti_idx on im_search_objects using gist(fti); create index im_search_objects_object_id_idx on im_search_objects (object_id);
The following steps are executed during each search:
TSearch2 contains several features allowing to adapt the search process to specific languages such as dictionaries, language specific stop words etc. However, ]project-open[ needs to be able to operate with content items of several languages at the same time.
However, it is not always possible to determine the language of a content item, so that we have decided not to implement these features at the moment.
However, the practical experiences of the use of TSearch with languages such as French, Spanish and German has required us to add a "normalization" feature to TSearch2 that "normalized" search content and queries in order to deal with accents and notational variants:
This normalization allows to search for "carlos" and to receive search results such as "CarlÃ³s" or "email@example.com".
Actually, we had to implement this normalization ourselves, because there was no code on the PostgreSQL page about it. Also, the PostgreSQL "conversion" functionality (UTF-8 => SQL_ASCII) did not elimiated the accents. Please check for the latest version at Sourceforge.net.
Ranking is currently limited to the built-in TSearch2 ranking functionality. In the future we are going to use several types of statistics to determine the "popularity" of a content item.
tcl/intranet-search-pg-procs.tcl Procedures for tsearch full text enginge driver
im_package_search_id Returns the ID of the current package. im_tsvector_to_headline Converts a tsvector (or better: its string representation) into a text string, obviously without the stop words. tsearch2::build_query Convert conjunctions to query characters for tsearch2 and => & not => ! or => | space => | (or) tsearch2::driver_info tsearch2::index add object to full text index tsearch2::search ftsenginedriver search operation implementation for tsearch2 tsearch2::summary Highlights matching terms. tsearch2::unindex Remove item from FTS index tsearch2::update_index update full text index
sql/postgresql/intranet-search-pg-create.sql sql/postgresql/intranet-search-pg-drop.sql sql/postgresql/upgrade/upgrade-184.108.40.206.0-220.127.116.11.1.sql sql/postgresql/upgrade/upgrade-18.104.22.168.2-22.214.171.124.3.sql sql/postgresql/upgrade/upgrade-126.96.36.199.0-188.8.131.52.1.sql sql/postgresql/upgrade/upgrade-184.108.40.206.0-220.127.116.11.1.sql sql/postgresql/upgrade/upgrade-18.104.22.168.1-22.214.171.124.2.sql sql/postgresql/upgrade/upgrade-126.96.36.199.0-188.8.131.52.1.sql