Translating the user query into formal
In this step, the search engine takes as input the semantic matches of
search terms and outputs an appropriate formal query according to the
meanings of keywords. To achieve this task, the search engine needs to
the focus of the user query (i.e., the type of the expected search
described earlier in Section 4, the subject keyword speciﬁes the type of
expected search result. Thus, it is reasonable to expect that the
is a general topic or concept (i.e. class). In the case when the subject
does not match any class, the search engine needs to ﬁgure out what the
results are. This will be discussed in the following subsections.
To better understand how to construct formal queries from user queries,
classify user queries into two types: i) simple queries which only
keywords, and ii) complex queries where more than two keywords are
the case of simple queries where the types of semantic entity match
are ﬁxed, we developed a set of templates to support the formulation of
queries. The situation is much trickier in the case of complex queries
are many variables for keywords combinations.
In this section, we ﬁrst look at the formulation of formal queries from
user queries. We then investigate how to handle complex ones.
As we used theSemSearch: A Search Engine for the Semantic Web 9
Sesame SeRQL language3 as the formal query language in the prototype of
SemSearch search engine, we explain the mechanism using the same
(Please note the underlying approach does not conﬁne itself to any
1 Simple user queries
As described earlier, we deﬁne simple user queries as those which
two keywords. In this subsection, we ﬁrst describe the query templates.
describe how to instantiate the query templates to construct formal
the case of simple user queries.
Fig. 3. The SeRQL query templates for two semantic entities.
Query templates. The query templates are developed to describe how to
relations between two semantic entities. As each semantic entity can be
a property or a instance, there are nine possible combinations in terms
lating queries to ﬁnd their relations. In the situation when the subject
does not produce a class match, but the other keyword does, we swap
tion and treat the other keyword as the subject keyword. Thus, we are
seven combinations. Figure 3 shows all these combinations and their
Now let us investigate the ﬁrst combination, which is when two keywords
a query both match classes. Suppose the subject keyword matches the
and the other keyword matches the class Ck. The search results are
to be the instances of the class Cs which have explicitly speciﬁed
the instances of the class Ck. For example, when querying for news about
students”, the expected results are the news entries in which phd
involved. Further, the search results are also expected to be
e.g., to motivate why certain news entries appear and others do not.
along with the retrieving of news instances, the related phd stduents
relations between students and news entries also need to be retrieved.
the search results of the query news:phd students are expected to be
(news, relation, phd-sudent). Finally, in order to make the search
understandable, the labels of each semantic entity also need to be
pulled out to
give an understandable explanation to users.
As shown in Figure 3, the query template of the combination described
is composed of two queries, which cover the relations from the instance
to the instances of Ck in both directions. There are three pairs of
the query. The ﬁrst pair, i1 and li1, denotes the instances of the class
URIs and labels), e.g. news stories in the example mentioned above. The
pair, p and lp, indicates the relations deﬁned between i1 and i2, e.g.
or mentions-person in the example. The last pair is the variables i2 and
which represent the instances of the class Ck, e.g. phd students
involved in the
corresponding news stories.
As we can see from the code, we only take into account the direct
that are explicitly speciﬁed in the back-end repository. The techniques
implicit relations like the ones used in TAP and in have not been used.
is for the sake of simplicity and for the sake of improving response
There are six other combinations, which have also been shown in Figure
When the subject keyword matches a class, the search results are the
of the matched class that have relations with the matches of other
In the situations when there are no class matches found for both
the focus of user query varies according to the type of the semantic
– When the subject keyword matches an instance and the keyword matches
a property, the search results are the values of the matched property of
matched instance. For example, the results of the query AKT:member are
the members of the project AKT.SemSearch: A Search Engine for the
Semantic Web 11
– When the subject match is a property and the keyword matches an
the results are the semantic entities which have the matched instance as
value of the matched property. For example, when query for
the results are the instances (e.g. publications) that have the person
Motta as authors.
– When both keywords match instances, the intension of the user query is
to ﬁnd out the instances that have relations with both matched
For example, the results of the query Victoria:Enrico are the instances
have relations with both people, e.g., projects that both people
in, publications that both people co-authored, etc.
– When both keywords match properties, the results are the instances
have both of the matched properties as relations to other instances. One
example is the query author:member, which gives back persons who are
authors and project members.
Query formulation. In the context of simple queries where only two
are involved, the task of query formulation is relatively easy, which is
the template that corresponds to the combinations of the semantic
the user keywords. As each keyword may match more than one semantic
often more than one query needs to be constructed. More speciﬁcally, if
subject keyword matches ns semantic entities and the other keyword has
matches, there are ns*nk queries that need to be constructed. For
the query news:phd students, two formal queries need to be constructed.
is because while the subject keyword matches the class news-item the
“phd students” matches two semantic entities, including the class
and the instance phd. The more the generated formal queries, the slower
search process will become. This problem becomes more acute when there
many keywords involved in the user query. We will discuss how to reduce
number of formal queries in the following.
2 Complex user queries
We deﬁne complex queries as those which involve more than two keywords.
described earlier in Section 4, such keywords can be required or
there is always at least one required keyword, which is the subject
Text Search Layer of the search engine ﬁnds semantic matches for the
To construct formal queries, the search engine needs to combine the
matches together and construct sub-queries for each of the combinations.
operational problem is that in real world situations there can be a
of matches and hence even more combinations.
Combining diﬀerent keywords. For keywords k1, k2, ..., kn, suppose that
number of the semantic matches of the keyword ki is ni. There will be
(which can be respresented as
i=1 ni) diﬀerent combinations when considering
all the keywords as required ones. Each combination of the
to a RDF-based formal query. Apart from considering all the keywords as
quired ones, the search engine also needs to investigate the
one or more keywords are left out, in order to producing complete result
end users. Table 1 shows how to calculate the numbers of combinations
sible for choosing diﬀerent numbers of keywords from the keywords set.
that because keyword k1, the subject keyword, is always a required
SemSearch, n1 appears in all these calculations.
Table 1. Numbers of keywords combinations for keywords k2, k2, ..., kn
The total number of keywords combinations is the sum of the combination
number of choosing n, n-1,..., 2 keywords from the speciﬁed keywords,
i=2 Ci. This indicates that the total number of diﬀerent combinations
huge when i) there are many keywords involved and ii) some keywords are
generic and thus have many matches. For instance, imagine there are 3
in the user query (including the subject keyword). Each keyword matches
ferent semantic entities. There will be 3*3*3=27 combinations for
three keywords as required ones and 3*(3+3)=18 combinations for only
ering two. Thus the total number of combinations is 27+18=45. This
that rules are needed to reduce the number of matches for each keyword.
context, we used several heuristic rules.
– Rule1. The subject keyword always matches class entities when there
more than two keywords involved in the user query.
– Rule2. Choose the closest entity matches of the keyword. This rule can
signiﬁcantly help us to reduce the number of entity matches. We are
concerned about losing useful matches.
– Rule3. Choose the most speciﬁc class match among the class matches.
Formulating formal queries For each combination of semantic matches, a
query needs to be constructed. As mentioned earlier, we assume that the
keyword always matches at least one class concept. Thereby, for keywords
..., kn, we can expect that the main search results will always be the
of the matches of the keyword k1 (i.e. the subject keywords) that have
A Search Engine for the Semantic Web 13
with other keywords. Furthermore, as described earlier in the last
order to allow the search results to be understandable, the relations
related instances also need to be pulled out, which explains the search
A formal query in SeRQL comprises three building blocks: the head block,
which describes what needs to be retrieved, the body block, which
how, and the condition block, which expresses conditions. In addition,
to cover relations of two entities in both directions, the query also
union block, which covers all the possible relations between the
words. The construction of all these blocks depends on the type (i.e.
erty, or instance) of the semantic entity match of each keyword
contained in a
combination of keywords’ matches.
Figure 4 shows a fragment of Java code for constructing SeRQL queries
combinations of semantic matches of keywords. In the head block, two
variables (pj , lpj) and (ij , lij) are added for each keyword, which
the relations of the search results and the keyword in question thus
the self-explanation of search results. The body block speciﬁes that the
results and each keyword must be explicitly connected in triples
the type of the semantic entity match of each keyword. The union block
the other direction that is not covered in the main query. These
are derived from the templates of simple user queries described in