#include <token.h>
Inherits safe_bool< T >< >.
Inherited by PhraseHunter::CorpusTokenBase, PhraseHunter::EmptyToken, and PhraseHunter::MutableToken.
Inheritance diagram for PhraseHunter::Token:


Public Member Functions | |
| virtual | ~Token () |
| virtual size_t | length () const |
| Return the real length (bytes) of a token string in the index. | |
| bool | isUniform () const |
| Returns true if all occurrences in all files are expected to look *exactly* the same. | |
| virtual bool | isEmpty () const |
| Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty. | |
| virtual unsigned int | corpusFrequency () const=0 |
| Return the number of times this Token occurs in the corpus. | |
| schma::UnicodePtr | tokenString () const |
| Returns the string to which this Token belongs as a UnicodePtr. | |
| virtual unsigned int | documentFrequency () const |
| Returns the number of documents a Token occurs in. | |
| bool | inDoc (DocID docID) const |
| Returns true if this Token occurs in a document. | |
| const PositionList & | documentOccurrences (DocID docID) |
| Returns a reference to all offsets of a Token in a particular document as a PositionList. | |
| const OccurrenceMap & | allOccurrences () const |
| Return a reference to the entire OccurrenceMap of a Token. | |
| virtual std::vector< DocID > | documentIDs () const |
| Returns the DocIDs of all documents, in which this Token occurs. | |
| virtual unsigned int | numTokens () const |
| virtual TokenID | id () const |
| Returns the ID for this Token. | |
Protected Member Functions | |
| Token (const char *token) | |
| Token (schma::UnicodePtr tokenstring) | |
Protected Attributes | |
| schma::UnicodePtr | m_tokenstring |
| OccurrenceMap | m_occurrences |
Static Protected Attributes | |
| static const int | SPACE_BETWEEN_TWO_TOKENS = 2 |
A Token consists of a particular string (i.e. word type) and a map of its occurrences, i.e. documents and offsets, in the corpus. Tokens may be used in boolean expressions.
Definition at line 46 of file token.h.
| PhraseHunter::Token::Token | ( | const char * | token | ) | [inline, protected] |
| PhraseHunter::Token::Token | ( | schma::UnicodePtr | tokenstring | ) | [inline, protected] |
| virtual size_t PhraseHunter::Token::length | ( | ) | const [inline, virtual] |
Return the real length (bytes) of a token string in the index.
Reimplemented in PhraseHunter::Phrase.
Definition at line 62 of file token.h.
References m_tokenstring, and SPACE_BETWEEN_TWO_TOKENS.
| bool PhraseHunter::Token::isUniform | ( | ) | const [inline] |
| virtual bool PhraseHunter::Token::isEmpty | ( | ) | const [inline, virtual] |
Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty.
Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::CorpusToken.
Definition at line 72 of file token.h.
References m_occurrences.
Referenced by PhraseHunter::Phrase::getAdjacent().
| virtual unsigned int PhraseHunter::Token::corpusFrequency | ( | ) | const [pure virtual] |
Return the number of times this Token occurs in the corpus.
Implemented in PhraseHunter::EmptyToken, PhraseHunter::CorpusTokenBase, and PhraseHunter::MutableToken.
| schma::UnicodePtr PhraseHunter::Token::tokenString | ( | ) | const [inline] |
Returns the string to which this Token belongs as a UnicodePtr.
Definition at line 78 of file token.h.
References m_tokenstring.
| virtual unsigned int PhraseHunter::Token::documentFrequency | ( | ) | const [inline, virtual] |
Returns the number of documents a Token occurs in.
Reimplemented in PhraseHunter::LightCorpusToken.
Definition at line 81 of file token.h.
References m_occurrences.
| bool PhraseHunter::Token::inDoc | ( | DocID | docID | ) | const [inline] |
Returns true if this Token occurs in a document.
Definition at line 84 of file token.h.
References m_occurrences.
Referenced by PhraseHunter::MutableToken::removeDocument().
| const PositionList& PhraseHunter::Token::documentOccurrences | ( | DocID | docID | ) | [inline] |
Returns a reference to all offsets of a Token in a particular document as a PositionList.
Definition at line 87 of file token.h.
References m_occurrences.
| const OccurrenceMap& PhraseHunter::Token::allOccurrences | ( | ) | const [inline] |
Return a reference to the entire OccurrenceMap of a Token.
Definition at line 90 of file token.h.
References m_occurrences.
| std::vector< DocID > PhraseHunter::Token::documentIDs | ( | ) | const [virtual] |
Returns the DocIDs of all documents, in which this Token occurs.
Definition at line 33 of file token.cpp.
References m_occurrences.
| virtual unsigned int PhraseHunter::Token::numTokens | ( | ) | const [inline, virtual] |
Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::Phrase.
| virtual TokenID PhraseHunter::Token::id | ( | ) | const [inline, virtual] |
Returns the ID for this Token.
Reimplemented in PhraseHunter::CorpusTokenBase.
Definition at line 96 of file token.h.
References PhraseHunter::InvalidTokenID.
schma::UnicodePtr PhraseHunter::Token::m_tokenstring [protected] |
Definition at line 49 of file token.h.
Referenced by PhraseHunter::Phrase::length(), length(), and tokenString().
OccurrenceMap PhraseHunter::Token::m_occurrences [protected] |
Definition at line 50 of file token.h.
Referenced by PhraseHunter::MutableToken::addOccurrence(), allOccurrences(), documentFrequency(), documentIDs(), documentOccurrences(), inDoc(), PhraseHunter::CorpusToken::insertPositions(), isEmpty(), and PhraseHunter::MutableToken::removeDocument().
const int PhraseHunter::Token::SPACE_BETWEEN_TWO_TOKENS = 2 [static, protected] |
1.5.1