HYKE LogoSemantic Search

State of the art
An overview of SemSearch
The Google-like query interface
Making sense of the user query
Translating the user query into formal queries
Implementation and experimental evaluation
Conclusions and future work
1st Workshop on Semantic Search
2nd Workshop on Semantic Search
3rd Workshop on Semantic Search
4th Workshop on Semantic Search

Translating the user query into formal queries

In this step, the search engine takes as input the semantic matches of user
search terms and outputs an appropriate formal query according to the semantic
meanings of keywords. To achieve this task, the search engine needs to capture
the focus of the user query (i.e., the type of the expected search results). As
described earlier in Section 4, the subject keyword specifies the type of the
expected search result. Thus, it is reasonable to expect that the queried subject
is a general topic or concept (i.e. class). In the case when the subject keyword
does not match any class, the search engine needs to figure out what the expected
results are. This will be discussed in the following subsections.
To better understand how to construct formal queries from user queries, we
classify user queries into two types: i) simple queries which only comprise two
keywords, and ii) complex queries where more than two keywords are involved. In
the case of simple queries where the types of semantic entity match combinations
are fixed, we developed a set of templates to support the formulation of formal
queries. The situation is much trickier in the case of complex queries where there
are many variables for keywords combinations.
In this section, we first look at the formulation of formal queries from simple
user queries. We then investigate how to handle complex ones.
As we used theSemSearch: A Search Engine for the Semantic Web 9
Sesame SeRQL language3 as the formal query language in the prototype of the
SemSearch search engine, we explain the mechanism using the same language
(Please note the underlying approach does not confine itself to any specific query
1 Simple user queries
As described earlier, we define simple user queries as those which involve only
two keywords. In this subsection, we first describe the query templates. We then
describe how to instantiate the query templates to construct formal queries in
the case of simple user queries.

Fig. 3. The SeRQL query templates for two semantic entities.

Query templates. The query templates are developed to describe how to retrieve
relations between two semantic entities. As each semantic entity can be a class,
a property or a instance, there are nine possible combinations in terms of formu-
lating queries to find their relations. In the situation when the subject keyword
does not produce a class match, but the other keyword does, we swap their posi-
tion and treat the other keyword as the subject keyword. Thus, we are left with
seven combinations. Figure 3 shows all these combinations and their templates.
Now let us investigate the first combination, which is when two keywords in
a query both match classes. Suppose the subject keyword matches the class Cs
and the other keyword matches the class Ck. The search results are expected
to be the instances of the class Cs which have explicitly specified relations with
the instances of the class Ck. For example, when querying for news about “phd
students”, the expected results are the news entries in which phd students are
involved. Further, the search results are also expected to be self-explanatory,
e.g., to motivate why certain news entries appear and others do not. Thus,
along with the retrieving of news instances, the related phd stduents and the
relations between students and news entries also need to be retrieved. Therefore,
the search results of the query news:phd students are expected to be triples of
(news, relation, phd-sudent). Finally, in order to make the search results more
understandable, the labels of each semantic entity also need to be pulled out to
give an understandable explanation to users.
As shown in Figure 3, the query template of the combination described above
is composed of two queries, which cover the relations from the instance of Cs
to the instances of Ck in both directions. There are three pairs of variables in
the query. The first pair, i1 and li1, denotes the instances of the class Cs (i.e.,
URIs and labels), e.g. news stories in the example mentioned above. The second
pair, p and lp, indicates the relations defined between i1 and i2, e.g. has-author
or mentions-person in the example. The last pair is the variables i2 and li2
which represent the instances of the class Ck, e.g. phd students involved in the
corresponding news stories.
As we can see from the code, we only take into account the direct relations
that are explicitly specified in the back-end repository. The techniques for finding
implicit relations like the ones used in TAP and in have not been used. This
is for the sake of simplicity and for the sake of improving response time.
There are six other combinations, which have also been shown in Figure 3.
When the subject keyword matches a class, the search results are the instances
of the matched class that have relations with the matches of other keywords.
In the situations when there are no class matches found for both keywords,
the focus of user query varies according to the type of the semantic matches of
– When the subject keyword matches an instance and the keyword matches
a property, the search results are the values of the matched property of the
matched instance. For example, the results of the query AKT:member are
the members of the project AKT.SemSearch: A Search Engine for the Semantic Web 11
– When the subject match is a property and the keyword matches an instance,
the results are the semantic entities which have the matched instance as the
value of the matched property. For example, when query for author:enrico,
the results are the instances (e.g. publications) that have the person Enrico
Motta as authors.
– When both keywords match instances, the intension of the user query is often
to find out the instances that have relations with both matched instances.
For example, the results of the query Victoria:Enrico are the instances that
have relations with both people, e.g., projects that both people participate
in, publications that both people co-authored, etc.
– When both keywords match properties, the results are the instances that
have both of the matched properties as relations to other instances. One
example is the query author:member, which gives back persons who are both
authors and project members.
Query formulation. In the context of simple queries where only two keywords
are involved, the task of query formulation is relatively easy, which is to initiate
the template that corresponds to the combinations of the semantic matches of
the user keywords. As each keyword may match more than one semantic entity,
often more than one query needs to be constructed. More specifically, if the
subject keyword matches ns semantic entities and the other keyword has nk
matches, there are ns*nk queries that need to be constructed. For example in
the query news:phd students, two formal queries need to be constructed. This
is because while the subject keyword matches the class news-item the keyword
“phd students” matches two semantic entities, including the class phd-student
and the instance phd. The more the generated formal queries, the slower the
search process will become. This problem becomes more acute when there are
many keywords involved in the user query. We will discuss how to reduce the
number of formal queries in the following.
2 Complex user queries
We define complex queries as those which involve more than two keywords. As
described earlier in Section 4, such keywords can be required or optional, and
there is always at least one required keyword, which is the subject keyword. The
Text Search Layer of the search engine finds semantic matches for the keywords.
To construct formal queries, the search engine needs to combine the semantic
matches together and construct sub-queries for each of the combinations. A key
operational problem is that in real world situations there can be a large number
of matches and hence even more combinations.
Combining different keywords. For keywords k1, k2, ..., kn, suppose that the
number of the semantic matches of the keyword ki is ni. There will be n1*n2*...*nn
(which can be respresented as
i=1 ni) different combinations when considering
all the keywords as required ones. Each combination of the matches corresponds
to a RDF-based formal query. Apart from considering all the keywords as re-
quired ones, the search engine also needs to investigate the combinations where
one or more keywords are left out, in order to producing complete result sets to
end users. Table 1 shows how to calculate the numbers of combinations pos-
sible for choosing different numbers of keywords from the keywords set. Note
that because keyword k1, the subject keyword, is always a required keyword in
SemSearch, n1 appears in all these calculations.

Table 1. Numbers of keywords combinations for keywords k2, k2, ..., kn

The total number of keywords combinations is the sum of the combination
number of choosing n, n-1,..., 2 keywords from the specified keywords, which is
i=2 Ci. This indicates that the total number of different combinations can get
huge when i) there are many keywords involved and ii) some keywords are very
generic and thus have many matches. For instance, imagine there are 3 keywords
in the user query (including the subject keyword). Each keyword matches 3 dif-
ferent semantic entities. There will be 3*3*3=27 combinations for considering all
three keywords as required ones and 3*(3+3)=18 combinations for only consid-
ering two. Thus the total number of combinations is 27+18=45. This indicates
that rules are needed to reduce the number of matches for each keyword. In this
context, we used several heuristic rules.
– Rule1. The subject keyword always matches class entities when there are
more than two keywords involved in the user query.
– Rule2. Choose the closest entity matches of the keyword. This rule can
significantly help us to reduce the number of entity matches. We are however
concerned about losing useful matches.
– Rule3. Choose the most specific class match among the class matches.
Formulating formal queries For each combination of semantic matches, a formal
query needs to be constructed. As mentioned earlier, we assume that the subject
keyword always matches at least one class concept. Thereby, for keywords k1, k2,
..., kn, we can expect that the main search results will always be the instances
of the matches of the keyword k1 (i.e. the subject keywords) that have relationsSemSearch:
A Search Engine for the Semantic Web 13
with other keywords. Furthermore, as described earlier in the last section, in
order to allow the search results to be understandable, the relations and the
related instances also need to be pulled out, which explains the search results.
A formal query in SeRQL comprises three building blocks: the head block,
which describes what needs to be retrieved, the body block, which describes
how, and the condition block, which expresses conditions. In addition, in order
to cover relations of two entities in both directions, the query also comprises a
union block, which covers all the possible relations between the involved key-
words. The construction of all these blocks depends on the type (i.e. class, prop-
erty, or instance) of the semantic entity match of each keyword contained in a
combination of keywords’ matches.
Figure 4 shows a fragment of Java code for constructing SeRQL queries from
combinations of semantic matches of keywords. In the head block, two pairs of
variables (pj , lpj) and (ij , lij) are added for each keyword, which give back
the relations of the search results and the keyword in question thus facilitating
the self-explanation of search results. The body block specifies that the search
results and each keyword must be explicitly connected in triples according to
the type of the semantic entity match of each keyword. The union block covers
the other direction that is not covered in the main query. These building blocks
are derived from the templates of simple user queries described in section 1.