Skip to content
Snippets Groups Projects
XMLScanner.hpp 53.9 KiB
Newer Older
PeiYong Zhang's avatar
PeiYong Zhang committed
/*
 * Copyright 1999-2002,2004 The Apache Software Foundation.
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
PeiYong Zhang's avatar
PeiYong Zhang committed
 */

/*
 * $Log$
Alberto Massari's avatar
Alberto Massari committed
 * Revision 1.48  2005/01/06 21:39:43  amassari
 * Removed warnings
 *
 * Revision 1.47  2004/12/14 16:16:36  cargilld
 * Fix for xercesc-684: Add accessor to XMLScanner to get the current grammar type.
 *
 * Revision 1.46  2004/12/09 20:31:39  knoaman
 * DOM L3: pass schema normalized value only when datatype-normalization feature
 * is enabled.
 *
 * Revision 1.45  2004/12/07 19:45:43  knoaman
 * An option to ignore a cached DTD grammar when a document contains an
 * internal and external subset.
 *
 * Revision 1.44  2004/12/03 19:40:30  cargilld
 * Change call to resolveEntity to pass in public id so that only one call to resolveEntity is needed (a follow-on to Alberto's fix).
 *
 * Revision 1.43  2004/10/12 21:33:05  peiyongz
 * Change attribute number threshold to 100
 *
 * Revision 1.42  2004/09/29 21:23:34  peiyongz
 * default implementation provided
 *
 * Revision 1.41  2004/09/29 19:00:02  peiyongz
 * [jira1207] --patch from Dan Rosen
 *
 * Revision 1.40  2004/09/28 21:27:38  peiyongz
 * Optimized duplicated attributes checking for large number of attributes
 *
 * Revision 1.39  2004/09/28 02:14:13  cargilld
 * Add support for validating annotations.
 *
 * Revision 1.38  2004/09/23 01:09:55  cargilld
 * Add support for generating synthetic XSAnnotations.  When a schema component has non-schema attributes and no child attributes create a synthetic XSAnnotation (under feature control) so the non-schema attributes can be recovered under PSVI.
 *
 * Revision 1.37  2004/09/08 13:56:14  peiyongz
 * Apache License Version 2.0
 *
 * Revision 1.36  2004/06/14 15:18:53  peiyongz
 * Consolidated End Of Line Handling
 *
 * Revision 1.35  2004/04/13 18:57:54  peiyongz
 * Unrelavant comment removal
 *
PeiYong Zhang's avatar
PeiYong Zhang committed
 * Revision 1.34  2004/04/13 16:56:58  peiyongz
 * IdentityConstraintHandler
 *
 * Revision 1.33  2004/04/07 14:15:12  peiyongz
 * allow internalDTD (conditionally) with grammar reusing
 *
 * Revision 1.32  2003/12/31 15:40:00  cargilld
 * Release memory when an error is encountered.
 *
 * Revision 1.31  2003/11/28 21:18:32  knoaman
 * Make use of canonical representation in PSVIElement
 *
Khaled Noaman's avatar
Khaled Noaman committed
 * Revision 1.30  2003/11/28 19:54:31  knoaman
 * PSVIElement update
 *
 * Revision 1.29  2003/11/27 22:52:37  knoaman
 * PSVIElement implementation
 *
 * Revision 1.28  2003/11/24 05:09:38  neilg
 * implement new, statless, method for detecting duplicate attributes
 *
 * Revision 1.27  2003/11/13 15:00:44  peiyongz
 * Solve Compilation/Linkage error on AIX/Solaris/HP/Linux
 *
 * Revision 1.26  2003/11/12 20:29:47  peiyongz
 * Stateless Grammar: ValidationContext
 *
 * Revision 1.25  2003/11/06 15:30:06  neilg
 * first part of PSVI/schema component model implementation, thanks to David Cargill.  This covers setting the PSVIHandler on parser objects, as well as implementing XSNotation, XSSimpleTypeDefinition, XSIDCDefinition, and most of XSWildcard, XSComplexTypeDefinition, XSElementDeclaration, XSAttributeDeclaration and XSAttributeUse.
 *
 * Revision 1.24  2003/10/22 20:22:30  knoaman
 * Prepare for annotation support.
 *
 * Revision 1.23  2003/07/10 19:47:24  peiyongz
 * Stateless Grammar: Initialize scanner with grammarResolver,
 *                                creating grammar through grammarPool
 *
 * Revision 1.22  2003/05/16 21:36:58  knoaman
 * Memory manager implementation: Modify constructors to pass in the memory manager.
 *
 * Revision 1.21  2003/05/15 18:26:29  knoaman
 * Partial implementation of the configurable memory manager.
 *
 * Revision 1.20  2003/04/22 14:52:37  knoaman
 * Initialize security manager in constructor.
 *
 * Revision 1.19  2003/04/17 22:00:46  neilg
 * This commit implements detection of exponential entity
 * expansions inside the scanner code.  This is only done when a
 * security manager instance has been registered with the parser by
 * the application.  The default number of entities which may be
 * expanded is 50000; this appears to work very well for SAX, but DOM
 * parsing applications may wish to set this limit considerably lower.
 *
 * Added SecurityManager to enable detection of exponentially-expanding entities
 * 
Tinny Ng's avatar
Tinny Ng committed
 * Revision 1.18  2003/03/10 15:27:29  tng
 * XML1.0 Errata E38
 *
 * Revision 1.17  2003/03/07 18:08:58  tng
 * Return a reference instead of void for operator=
 *
 * Revision 1.16  2003/01/03 20:08:40  tng
 * New feature StandardUriConformant to force strict standard uri conformance.
 *
 * Revision 1.15  2002/12/27 16:16:51  knoaman
 * Set scanner options and handlers.
 *
Tinny Ng's avatar
Tinny Ng committed
 * Revision 1.14  2002/12/20 22:09:56  tng
 * XML 1.1
 *
Khaled Noaman's avatar
Khaled Noaman committed
 * Revision 1.13  2002/12/04 01:41:14  knoaman
 * Scanner re-organization.
 *
Tinny Ng's avatar
Tinny Ng committed
 * Revision 1.12  2002/11/04 14:58:19  tng
 * C++ Namespace Support.
 *
 * Revision 1.11  2002/08/27 05:56:39  knoaman
 * Identity Constraint: handle case of recursive elements.
 *
 * Revision 1.10  2002/08/16 15:46:17  knoaman
 * Bug 7698 : filenames with embedded spaces in schemaLocation strings not handled properly.
 *
 * Revision 1.9  2002/07/31 18:49:29  tng
 * [Bug 6227] Make method getLastExtLocation() constant.
 *
 * Revision 1.8  2002/07/11 18:22:13  knoaman
 * Grammar caching/preparsing - initial implementation.
 *
 * Revision 1.7  2002/06/17 16:13:01  tng
 * DOM L3: Add the flag fNormalizeData so that datatype normalization defined by schema is done only if asked.
 *
 * Revision 1.6  2002/06/07 18:35:49  tng
 * Add getReaderMgr in XMLScanner so that the parser can query encoding information.
 *
 * Revision 1.5  2002/05/30 16:20:57  tng
 * Add feature to optionally ignore external DTD.
 *
 * Revision 1.4  2002/05/27 18:42:14  tng
 * To get ready for 64 bit large file, use XMLSSize_t to represent line and column number.
 *
Khaled Noaman's avatar
Khaled Noaman committed
 * Revision 1.3  2002/05/22 20:54:33  knoaman
 * Prepare for DOM L3 :
 * - Make use of the XMLEntityHandler/XMLErrorReporter interfaces, instead of using
 * EntityHandler/ErrorHandler directly.
 * - Add 'AbstractDOMParser' class to provide common functionality for XercesDOMParser
 * and DOMBuilder.
 *
 * Revision 1.2  2002/03/25 20:25:32  knoaman
 * Move particle derivation checking from TraverseSchema to SchemaValidator.
 *
 * Revision 1.1.1.1  2002/02/01 22:22:03  peiyongz
 * sane_include
PeiYong Zhang's avatar
PeiYong Zhang committed
 *
 * Revision 1.38  2001/11/30 22:19:15  peiyongz
 * cleanUp function made member function
 * cleanUp object moved to file scope
 *
 * Revision 1.37  2001/11/20 18:51:44  tng
 * Schema: schemaLocation and noNamespaceSchemaLocation to be specified outside the instance document.  New methods setExternalSchemaLocation and setExternalNoNamespaceSchemaLocation are added (for SAX2, two new properties are added).
 *
 * Revision 1.36  2001/11/13 13:27:28  tng
 * Move root element check to XMLScanner.
 *
 * Revision 1.35  2001/11/02 14:20:14  knoaman
 * Add support for identity constraints.
 *
 * Revision 1.34  2001/10/12 20:52:18  tng
 * Schema: Find the attributes see if they should be (un)qualified.
 *
 * Revision 1.33  2001/09/10 15:16:04  tng
 * Store the fGrammarType instead of calling getGrammarType all the time for faster performance.
 *
 * Revision 1.32  2001/09/10 14:06:22  tng
 * Schema: AnyAttribute support in Scanner and Validator.
 *
 * Revision 1.31  2001/08/13 15:06:39  knoaman
 * update <any> validation.
 *
 * Revision 1.30  2001/08/02 16:54:39  tng
 * Reset some Scanner flags in scanReset().
 *
 * Revision 1.29  2001/08/01 19:11:01  tng
 * Add full schema constraint checking flag to the samples and the parser.
 *
 * Revision 1.28  2001/07/24 21:23:39  tng
 * Schema: Use DatatypeValidator for ID/IDREF/ENTITY/ENTITIES/NOTATION.
 *
 * Revision 1.27  2001/07/13 16:56:48  tng
 * ScanId fix.
 *
 * Revision 1.26  2001/07/12 18:50:17  tng
 * Some performance modification regarding standalone check and xml decl check.
 *
 * Revision 1.25  2001/07/10 21:09:31  tng
 * Give proper error messsage when scanning external id.
 *
 * Revision 1.24  2001/07/09 13:42:08  tng
 * Partial Markup in Parameter Entity is validity constraint and thus should be just error, not fatal error.
 *
 * Revision 1.23  2001/07/05 13:12:11  tng
 * Standalone checking is validity constraint and thus should be just error, not fatal error:
 *
 * Revision 1.22  2001/06/22 12:42:33  tng
 * [Bug 2257] 1.5 thinks a <?xml-stylesheet ...> tag is a <?xml ...> tag
 *
 * Revision 1.21  2001/06/04 20:59:29  jberry
 * Add method incrementErrorCount for use by validator. Make sure to reset error count in _both_ the scanReset methods.
 *
 * Revision 1.20  2001/06/03 19:21:40  jberry
 * Add support for tracking error count during parse; enables simple parse without requiring error handler.
 *
 * Revision 1.19  2001/05/28 20:55:02  tng
 * Schema: allocate a fDTDValidator, fSchemaValidator explicitly to avoid wrong cast
 *
 * Revision 1.18  2001/05/11 15:17:28  tng
 * Schema: Nillable fixes.
 *
 * Revision 1.17  2001/05/11 13:26:17  tng
 * Copyright update.
 *
 * Revision 1.16  2001/05/03 20:34:29  tng
 * Schema: SchemaValidator update
 *
 * Revision 1.15  2001/05/03 19:09:09  knoaman
 * Support Warning/Error/FatalError messaging.
 * Validity constraints errors are treated as errors, with the ability by user to set
 * validity constraints as fatal errors.
 *
 * Revision 1.14  2001/04/19 18:16:59  tng
 * Schema: SchemaValidator update, and use QName in Content Model
 *
 * Revision 1.13  2001/03/30 16:46:56  tng
 * Schema: Use setDoSchema instead of setSchemaValidation which makes more sense.
 *
 * Revision 1.12  2001/03/30 16:35:06  tng
 * Schema: Whitespace normalization.
 *
 * Revision 1.11  2001/03/21 21:56:05  tng
 * Schema: Add Schema Grammar, Schema Validator, and split the DTDValidator into DTDValidator, DTDScanner, and DTDGrammar.
 *
 * Revision 1.10  2001/02/15 15:56:27  tng
 * Schema: Add setSchemaValidation and getSchemaValidation for DOMParser and SAXParser.
 * Add feature "http://apache.org/xml/features/validation/schema" for SAX2XMLReader.
 * New data field  fSchemaValidation in XMLScanner as the flag.
 *
 * Revision 1.9  2000/04/12 22:58:28  roddey
 * Added support for 'auto validate' mode.
 *
 * Revision 1.8  2000/03/03 01:29:32  roddey
 * Added a scanReset()/parseReset() method to the scanner and
 * parsers, to allow for reset after early exit from a progressive parse.
 * Added calls to new Terminate() call to all of the samples. Improved
 * documentation in SAX and DOM parsers.
 *
 * Revision 1.7  2000/03/02 19:54:30  roddey
 * This checkin includes many changes done while waiting for the
 * 1.1.0 code to be finished. I can't list them all here, but a list is
 * available elsewhere.
 *
 * Revision 1.6  2000/02/24 20:18:07  abagchi
 * Swat for removing Log from API docs
 *
 * Revision 1.5  2000/02/06 07:47:54  rahulj
 * Year 2K copyright swat.
 *
 * Revision 1.4  2000/01/24 20:40:43  roddey
 * Exposed the APIs to get to the byte offset in the source XML buffer. This stuff
 * is not tested yet, but I wanted to get the API changes in now so that the API
 * can be stablized.
 *
 * Revision 1.3  2000/01/12 23:52:46  roddey
 * These are trivial changes required to get the C++ and Java versions
 * of error messages more into sync. Mostly it was where the Java version
 * was passing out one or more parameter than the C++ version was. In
 * some cases the change just required an extra parameter to get the
 * needed info to the place where the error was issued.
 *
 * Revision 1.2  2000/01/12 00:15:04  roddey
 * Changes to deal with multiply nested, relative pathed, entities and to deal
 * with the new URL class changes.
 *
 * Revision 1.1.1.1  1999/11/09 01:08:23  twl
 * Initial checkin
 *
 * Revision 1.4  1999/11/08 20:44:52  rahul
 * Swat for adding in Product name and CVS comment log variable.
 *
 */


#if !defined(XMLSCANNER_HPP)
#define XMLSCANNER_HPP

#include <xercesc/framework/XMLBufferMgr.hpp>
#include <xercesc/framework/XMLErrorCodes.hpp>
#include <xercesc/framework/XMLRefInfo.hpp>
#include <xercesc/util/PlatformUtils.hpp>
Khaled Noaman's avatar
Khaled Noaman committed
#include <xercesc/util/NameIdPool.hpp>
#include <xercesc/util/RefHashTableOf.hpp>
#include <xercesc/util/SecurityManager.hpp>
PeiYong Zhang's avatar
PeiYong Zhang committed
#include <xercesc/internal/ReaderMgr.hpp>
#include <xercesc/internal/ElemStack.hpp>
PeiYong Zhang's avatar
PeiYong Zhang committed
#include <xercesc/validators/DTD/DTDEntityDecl.hpp>
Khaled Noaman's avatar
Khaled Noaman committed
#include <xercesc/framework/XMLAttr.hpp>
#include <xercesc/framework/ValidationContext.hpp>
#include <xercesc/validators/common/GrammarResolver.hpp>
PeiYong Zhang's avatar
PeiYong Zhang committed

Tinny Ng's avatar
Tinny Ng committed
XERCES_CPP_NAMESPACE_BEGIN

PeiYong Zhang's avatar
PeiYong Zhang committed
class InputSource;
class XMLDocumentHandler;
class XMLEntityHandler;
class ErrorHandler;
Khaled Noaman's avatar
Khaled Noaman committed
class DocTypeHandler;
class XMLPScanToken;
class XMLStringPool;
class Grammar;
PeiYong Zhang's avatar
PeiYong Zhang committed
class XMLValidator;

struct PSVIElemContext
{
    bool               fIsSpecified;
    bool               fErrorOccurred;
    int                fElemDepth;
    int                fFullValidationDepth;
    int                fNoneValidationDepth;
    DatatypeValidator* fCurrentDV;
    ComplexTypeInfo*   fCurrentTypeInfo;
    const XMLCh*       fNormalizedValue;
PeiYong Zhang's avatar
PeiYong Zhang committed
//  This is the mondo scanner class, which does the vast majority of the
//  work of parsing. It handles reading in input and spitting out events
//  to installed handlers.
class XMLPARSER_EXPORT XMLScanner : public XMemory, public XMLBufferFullHandler
PeiYong Zhang's avatar
PeiYong Zhang committed
{
public :
    // -----------------------------------------------------------------------
    //  Public class types
    //
    //  NOTE: These should really be private, but some of the compilers we
    //  have to deal with are too stupid to understand this.
    //
    //  DeclTypes
    //      Used by scanXMLDecl() to know what type of decl it should scan.
    //      Text decls have slightly different rules from XMLDecls.
    //
    //  EntityExpRes
    //      These are the values returned from the entity expansion method,
    //      to indicate how it went.
    //
    //  XMLTokens
    //      These represent the possible types of input we can get while
    //      scanning content.
    //
    //  ValScheme
    //      This indicates what the scanner should do in terms of validation.
    //      'Auto' means if there is any int/ext subset, then validate. Else,
    //      don't.
    // -----------------------------------------------------------------------
    enum DeclTypes
    {
        Decl_Text
        , Decl_XML
    };

    enum EntityExpRes
    {
        EntityExp_Pushed
        , EntityExp_Returned
        , EntityExp_Failed
    };

    enum XMLTokens
    {
        Token_CData
        , Token_CharData
        , Token_Comment
        , Token_EndTag
        , Token_EOF
        , Token_PI
        , Token_StartTag
        , Token_Unknown
    };

    enum ValSchemes
    {
        Val_Never
        , Val_Always
        , Val_Auto
    };


    // -----------------------------------------------------------------------
    //  Constructors and Destructor
    // -----------------------------------------------------------------------
    XMLScanner
    (
        XMLValidator* const valToAdopt
        , GrammarResolver* const grammarResolver
        , MemoryManager* const manager = XMLPlatformUtils::fgMemoryManager
PeiYong Zhang's avatar
PeiYong Zhang committed
    );
    XMLScanner
    (
        XMLDocumentHandler* const  docHandler
        , DocTypeHandler* const    docTypeHandler
        , XMLEntityHandler* const  entityHandler
        , XMLErrorReporter* const  errReporter
        , XMLValidator* const      valToAdopt
        , GrammarResolver* const grammarResolver
        , MemoryManager* const manager = XMLPlatformUtils::fgMemoryManager
PeiYong Zhang's avatar
PeiYong Zhang committed
    );
Khaled Noaman's avatar
Khaled Noaman committed
    virtual ~XMLScanner();
PeiYong Zhang's avatar
PeiYong Zhang committed


    // -----------------------------------------------------------------------
    //  Error emitter methods
    // -----------------------------------------------------------------------
    bool emitErrorWillThrowException(const XMLErrs::Codes toEmit);
PeiYong Zhang's avatar
PeiYong Zhang committed
    void emitError(const XMLErrs::Codes toEmit);
    void emitError
    (
        const   XMLErrs::Codes    toEmit
        , const XMLCh* const        text1
        , const XMLCh* const        text2 = 0
        , const XMLCh* const        text3 = 0
        , const XMLCh* const        text4 = 0
    );
    void emitError
    (
        const   XMLErrs::Codes    toEmit
        , const char* const         text1
        , const char* const         text2 = 0
        , const char* const         text3 = 0
        , const char* const         text4 = 0
    );

    // -----------------------------------------------------------------------
    //  Implementation of XMLBufferFullHandler interface
    // -----------------------------------------------------------------------

    virtual bool bufferFull(XMLBuffer& toSend)
    {
        sendCharData(toSend);
        return true;
    }

    virtual Grammar::GrammarType getCurrentGrammarType() const;

Khaled Noaman's avatar
Khaled Noaman committed
    // -----------------------------------------------------------------------
    //  Public pure virtual methods
    // -----------------------------------------------------------------------
    virtual const XMLCh* getName() const = 0;
    virtual NameIdPool<DTDEntityDecl>* getEntityDeclPool() = 0;
    virtual const NameIdPool<DTDEntityDecl>* getEntityDeclPool() const = 0;
    virtual unsigned int resolveQName
    (
        const   XMLCh* const        qName
        ,       XMLBuffer&          prefixBufToFill
        , const short               mode
        ,       int&                prefixColonPos
    ) = 0;
    virtual void scanDocument
    (
        const   InputSource&    src
    ) = 0;
    virtual bool scanNext(XMLPScanToken& toFill) = 0;
    virtual Grammar* loadGrammar
    (
        const   InputSource&    src
        , const short           grammarType
        , const bool            toCache = false
    ) = 0;

PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
    //  Getter methods
    // -----------------------------------------------------------------------
    const XMLDocumentHandler* getDocHandler() const;
    XMLDocumentHandler* getDocHandler();
    const DocTypeHandler* getDocTypeHandler() const;
    DocTypeHandler* getDocTypeHandler();
    bool getDoNamespaces() const;
    ValSchemes getValidationScheme() const;
    bool getDoSchema() const;
    bool getValidationSchemaFullChecking() const;
PeiYong Zhang's avatar
PeiYong Zhang committed
    bool getIdentityConstraintChecking() const;
PeiYong Zhang's avatar
PeiYong Zhang committed
    const XMLEntityHandler* getEntityHandler() const;
    XMLEntityHandler* getEntityHandler();
    const XMLErrorReporter* getErrorReporter() const;
    XMLErrorReporter* getErrorReporter();
    const ErrorHandler* getErrorHandler() const;
    ErrorHandler* getErrorHandler();
    const PSVIHandler* getPSVIHandler() const;
    PSVIHandler* getPSVIHandler();
PeiYong Zhang's avatar
PeiYong Zhang committed
    bool getExitOnFirstFatal() const;
    bool getValidationConstraintFatal() const;
    RefHashTableOf<XMLRefInfo>* getIDRefList();
    const RefHashTableOf<XMLRefInfo>* getIDRefList() const;

    ValidationContext*   getValidationContext();

PeiYong Zhang's avatar
PeiYong Zhang committed
    bool getInException() const;
Khaled Noaman's avatar
Khaled Noaman committed
    /*bool getLastExtLocation
PeiYong Zhang's avatar
PeiYong Zhang committed
    (
                XMLCh* const    sysIdToFill
        , const unsigned int    maxSysIdChars
        ,       XMLCh* const    pubIdToFill
        , const unsigned int    maxPubIdChars
        ,       XMLSSize_t&     lineToFill
        ,       XMLSSize_t&     colToFill
Khaled Noaman's avatar
Khaled Noaman committed
    ) const;*/
PeiYong Zhang's avatar
PeiYong Zhang committed
    const Locator* getLocator() const;
    const ReaderMgr* getReaderMgr() const;
PeiYong Zhang's avatar
PeiYong Zhang committed
    unsigned int getSrcOffset() const;
    bool getStandalone() const;
    const XMLValidator* getValidator() const;
    XMLValidator* getValidator();
    int getErrorCount();
    const XMLStringPool* getURIStringPool() const;
    XMLStringPool* getURIStringPool();
    bool getHasNoDTD() const;
    XMLCh* getExternalSchemaLocation() const;
    XMLCh* getExternalNoNamespaceSchemaLocation() const;
    SecurityManager* getSecurityManager() const;
    bool getLoadExternalDTD() const;
    bool isCachingGrammarFromParse() const;
    bool isUsingCachedGrammarInParse() const;
Khaled Noaman's avatar
Khaled Noaman committed
    bool getCalculateSrcOfs() const;
    Grammar* getRootGrammar() const;
Tinny Ng's avatar
Tinny Ng committed
    XMLReader::XMLVersion getXMLVersion() const;
    MemoryManager* getMemoryManager() const;
    ValueVectorOf<PrefMapElem*>* getNamespaceContext() const;
    unsigned int getPrefixId(const XMLCh* const prefix) const;
    const XMLCh* getPrefixForId(unsigned int prefId) const;
PeiYong Zhang's avatar
PeiYong Zhang committed

    bool getGenerateSyntheticAnnotations() const;
    bool getValidateAnnotations() const;
    bool getIgnoreCachedDTD() const;
PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
    //  Getter methods
    // -----------------------------------------------------------------------
    /**
      * When an attribute name has no prefix, unlike elements, it is not mapped
      * to the global namespace. So, in order to have something to map it to
      * for practical purposes, a id for an empty URL is created and used for
      * such names.
      *
      * @return The URL pool id of the URL for an empty URL "".
      */
    unsigned int getEmptyNamespaceId() const;

    /**
      * When a prefix is found that has not been mapped, an error is issued.
      * However, if the parser has been instructed not to stop on the first
      * fatal error, it needs to be able to continue. To do so, it will map
      * that prefix tot his magic unknown namespace id.
      *
      * @return The URL pool id of the URL for the unknown prefix
      *         namespace.
      */
    unsigned int getUnknownNamespaceId() const;

    /**
      * The prefix 'xml' is a magic prefix, defined by the XML spec and
      * requiring no prior definition. This method returns the id for the
      * intrinsically defined URL for this prefix.
      *
      * @return The URL pool id of the URL for the 'xml' prefix.
      */
    unsigned int getXMLNamespaceId() const;

    /**
      * The prefix 'xmlns' is a magic prefix, defined by the namespace spec
      * and requiring no prior definition. This method returns the id for the
      * intrinsically defined URL for this prefix.
      *
      * @return The URL pool id of the URL for the 'xmlns' prefix.
      */
    unsigned int getXMLNSNamespaceId() const;

    /**
      * This method find the passed URI id in its URI pool and
      * copy the text of that URI into the passed buffer.
      */
    bool getURIText
    (
        const   unsigned int    uriId
        ,       XMLBuffer&      uriBufToFill
    )   const;

    const XMLCh* getURIText(const   unsigned int    uriId) const;

    /* tell if the validator comes from user */
    bool isValidatorFromUser();

    /* tell if standard URI are forced */
    bool getStandardUriConformant() const;
PeiYong Zhang's avatar
PeiYong Zhang committed

    // -----------------------------------------------------------------------
    //  Setter methods
    // -----------------------------------------------------------------------
    void setDocHandler(XMLDocumentHandler* const docHandler);
    void setDocTypeHandler(DocTypeHandler* const docTypeHandler);
    void setDoNamespaces(const bool doNamespaces);
    void setEntityHandler(XMLEntityHandler* const docTypeHandler);
    void setErrorReporter(XMLErrorReporter* const errHandler);
    void setErrorHandler(ErrorHandler* const handler);
    void setPSVIHandler(PSVIHandler* const handler);
Khaled Noaman's avatar
Khaled Noaman committed
    void setURIStringPool(XMLStringPool* const stringPool);
PeiYong Zhang's avatar
PeiYong Zhang committed
    void setExitOnFirstFatal(const bool newValue);
    void setValidationConstraintFatal(const bool newValue);
    void setValidationScheme(const ValSchemes newScheme);
    void setValidator(XMLValidator* const valToAdopt);
    void setDoSchema(const bool doSchema);
    void setValidationSchemaFullChecking(const bool schemaFullChecking);
PeiYong Zhang's avatar
PeiYong Zhang committed
    void setIdentityConstraintChecking(const bool identityConstraintChecking);
PeiYong Zhang's avatar
PeiYong Zhang committed
    void setHasNoDTD(const bool hasNoDTD);
    void cacheGrammarFromParse(const bool newValue);
    void useCachedGrammarInParse(const bool newValue);
PeiYong Zhang's avatar
PeiYong Zhang committed
    void setRootElemName(XMLCh* rootElemName);
    void setExternalSchemaLocation(const XMLCh* const schemaLocation);
    void setExternalNoNamespaceSchemaLocation(const XMLCh* const noNamespaceSchemaLocation);
    void setExternalSchemaLocation(const char* const schemaLocation);
    void setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation);
    void setSecurityManager(SecurityManager* const securityManager);
    void setLoadExternalDTD(const bool loadDTD);
    void setNormalizeData(const bool normalizeData);
Khaled Noaman's avatar
Khaled Noaman committed
    void setCalculateSrcOfs(const bool newValue);
    void setParseSettings(XMLScanner* const refScanner);
    void setStandardUriConformant(const bool newValue);
    void setInputBufferSize(const size_t bufferSize);
PeiYong Zhang's avatar
PeiYong Zhang committed

    void setGenerateSyntheticAnnotations(const bool newValue);
    void setValidateAnnotations(const bool newValue);
    void setIgnoredCachedDTD(const bool newValue);
PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
    //  Mutator methods
    // -----------------------------------------------------------------------
    void incrementErrorCount(void);			// For use by XMLValidator

    // -----------------------------------------------------------------------
    //  Deprecated methods as of 3.2.0. Use getValidationScheme() and
    //  setValidationScheme() instead.
    // -----------------------------------------------------------------------
    bool getDoValidation() const;
    void setDoValidation(const bool validate);
PeiYong Zhang's avatar
PeiYong Zhang committed

    // -----------------------------------------------------------------------
    //  Document scanning methods
    //
    //  scanDocument() does the entire source document. scanFirst(),
    //  scanNext(), and scanReset() support a progressive parse.
    // -----------------------------------------------------------------------
    void scanDocument
    (
        const   XMLCh* const    systemId
    );
    void scanDocument
    (
        const   char* const     systemId
    );

    bool scanFirst
    (
        const   InputSource&    src
        ,       XMLPScanToken&  toFill
    );
    bool scanFirst
    (
        const   XMLCh* const    systemId
        ,       XMLPScanToken&  toFill
    );
    bool scanFirst
    (
        const   char* const     systemId
        ,       XMLPScanToken&  toFill
    );

    void scanReset(XMLPScanToken& toFill);

    bool checkXMLDecl(bool startWithAngle);

    // -----------------------------------------------------------------------
    //  Grammar preparsing methods
    // -----------------------------------------------------------------------
    Grammar* loadGrammar
    (
        const   XMLCh* const    systemId
        , const short           grammarType
        , const bool            toCache = false
    );
    Grammar* loadGrammar
    (
        const   char* const     systemId
        , const short           grammarType
        , const bool            toCache = false
    );

PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
    //  Notification that lazy data has been deleted
    // -----------------------------------------------------------------------
	static void reinitScannerMutex();
	static void reinitMsgLoader();

Khaled Noaman's avatar
Khaled Noaman committed
protected:
PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
Khaled Noaman's avatar
Khaled Noaman committed
    //  Protected pure virtual methods
PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
Khaled Noaman's avatar
Khaled Noaman committed
    virtual void scanCDSection() = 0;
    virtual void scanCharData(XMLBuffer& toToUse) = 0;
    virtual EntityExpRes scanEntityRef
PeiYong Zhang's avatar
PeiYong Zhang committed
    (
Khaled Noaman's avatar
Khaled Noaman committed
        const   bool    inAttVal
        ,       XMLCh&  firstCh
        ,       XMLCh&  secondCh
        ,       bool&   escaped
    ) = 0;
    virtual void scanDocTypeDecl() = 0;
    virtual void scanReset(const InputSource& src) = 0;
    virtual void sendCharData(XMLBuffer& toSend) = 0;
PeiYong Zhang's avatar
PeiYong Zhang committed

    //return owned by the caller
Alberto Massari's avatar
Alberto Massari committed
    virtual InputSource* resolveSystemId(const XMLCh* const /*sysId*/
                                        ,const XMLCh* const /*pubId*/) {return 0;};
PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
Khaled Noaman's avatar
Khaled Noaman committed
    //  Protected scanning methods
PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
    bool scanCharRef(XMLCh& toFill, XMLCh& second);
    void scanComment();
    bool scanEq(bool inDecl = false);
PeiYong Zhang's avatar
PeiYong Zhang committed
    void scanMiscellaneous();
    void scanPI();
    void scanProlog();
    void scanXMLDecl(const DeclTypes type);

    // -----------------------------------------------------------------------
    //  Private helper methods
    // -----------------------------------------------------------------------
    void checkInternalDTD(bool hasExtSubset, const XMLCh* const sysId, const XMLCh* const pubId);
Khaled Noaman's avatar
Khaled Noaman committed
    void checkIDRefs();
    bool isLegalToken(const XMLPScanToken& toCheck);
    XMLTokens senseNextToken(unsigned int& orgReader);
    void initValidator(XMLValidator* theValidator);
    inline void resetValidationContext();
    unsigned int *getNewUIntPtr();
    void resetUIntPool();
    void recreateUIntPool();
    inline
    void setAttrDupChkRegistry
         (
            const unsigned int &attrNumber
          ,       bool         &toUseHashTable
         );

PeiYong Zhang's avatar
PeiYong Zhang committed
    // -----------------------------------------------------------------------
    //  Data members
    //
    //  fBufferSize
    //      Maximum input buffer size
    //
PeiYong Zhang's avatar
PeiYong Zhang committed
    //  fAttrList
    //      Every time we get a new element start tag, we have to pass to
    //      the document handler the attributes found. To make it more
    //      efficient we keep this ref vector of XMLAttr objects around. We
    //      just reuse it over and over, allowing it to grow to meet the
PeiYong Zhang's avatar
PeiYong Zhang committed
    //
    //  fBufMgr
    //      This is a manager for temporary buffers used during scanning.
    //      For efficiency we must use a set of static buffers, but we have
    //      to insure that they are not incorrectly reused. So this manager
    //      provides the smarts to hand out buffers as required.
    //
    //  fDocHandler
    //      The client code's document handler. If zero, then no document
    //      handler callouts are done. We don't adopt it.
    //
    //  fDocTypeHandler
    //      The client code's document type handler (used by DTD Validator).
    //
    //  fDoNamespaces
    //      This flag indicates whether the client code wants us to do
    //      namespaces or not. If the installed validator indicates that it
    //      has to do namespaces, then this is ignored.
    //
    //  fEntityHandler
    //      The client code's entity handler. If zero, then no entity handler
    //      callouts are done. We don't adopt it.
    //
    //  fErrorReporter
    //      The client code's error reporter. If zero, then no error reporter
    //      callouts are done. We don't adopt it.
    //
    //  fErrorHandler
    //      The client code's error handler.  Need to store this info for
    //      Schema parse error handling.
    //
    //  fPSVIHandler
    //      The client code's PSVI handler.
    //
PeiYong Zhang's avatar
PeiYong Zhang committed
    //  fExitOnFirstFatal
    //      This indicates whether we bail out on the first fatal XML error
    //      or not. It defaults to true, which is the strict XML way, but it
    //      can be changed.
    //
    //  fValidationConstraintFatal
    //      This indicates whether we treat validation constraint errors as
    //      fatal errors or not. It defaults to false, but it can be changed.
    //
    //  fIDRefList
    //      This is a list of XMLRefInfo objects. This member lets us do all
    //      needed ID-IDREF balancing checks.
    //
    //  fInException
    //      To avoid a circular freakout when we catch an exception and emit
    //      it, which would normally throw again if the 'fail on first error'
    //      flag is one.
    //
    //  fReaderMgr
    //      This is the reader manager, from which we get characters. It
    //      manages the reader stack for us, and provides a lot of convenience
    //      methods to do specialized checking for chars, sequences of chars,
    //      skipping chars, etc...
    //
    //  fScannerId
    //  fSequenceId
    //      These are used for progressive parsing, to make sure that the
    //      client code does the right thing at the right time.
    //
    //  fStandalone
    //      Indicates whether the document is standalone or not. Defaults to
    //      no, but can be overridden in the XMLDecl.
    //
    //  fHasNoDTD
    //      Indicates the document has no DTD or has only an internal DTD subset
    //      which contains no parameter entity references.
    //
    //  fValidate
    //      Indicates whether any validation should be done. This is defined
    //      by the existence of a Grammar together with fValScheme.
    //
    //  fValidator
    //      The installed validator. We look at them via the abstract
    //      validator interface, and don't know what it actual is.
    //      Either point to user's installed validator, or fDTDValidator
    //      or fSchemaValidator.
    //
    //  fValidatorFromUser
    //      This flag indicates whether the validator was installed from
    //      user.  If false, then the validator was created by the Scanner.
    //
    //  fValScheme
    //      This is the currently set validation scheme. It defaults to
    //      'never', but can be set by the client.
    //
    //  fErrorCount
    //		The number of errors we've encountered.
    //
    //  fDoSchema
    //      This flag indicates whether the client code wants Schema to
    //      be processed or not.
    //
    //  fSchemaFullChecking
    //      This flag indicates whether the client code wants full Schema
    //      constraint checking.
    //
PeiYong Zhang's avatar
PeiYong Zhang committed
    //  fIdentityConstraintChecking
    //      This flag indicates whether the client code wants Identity
    //      Constraint checking, defaulted to true to maintain backward
    //      compatibility (to minimize supprise)
    //
PeiYong Zhang's avatar
PeiYong Zhang committed
    //  fAttName
    //  fAttValue
    //  fCDataBuf
    //  fNameBuf
    //  fQNameBuf
    //  fPrefixBuf
    //      For the most part, buffers are obtained from the fBufMgr object
    //      on the fly. However, for the start tag scan, we have a set of
    //      fixed buffers for performance reasons. These are used a lot and
    //      there are a number of them, so asking the buffer manager each
    //      time for new buffers is a bit too much overhead.
    //
    //  fEmptyNamespaceId
    //      This is the id of the empty namespace URI. This is a special one
    //      because of the xmlns="" type of deal. We have to quickly sense
    //      that its the empty namespace.
    //
    //  fUnknownNamespaceId
    //      This is the id of the namespace URI which is assigned to the
    //      global namespace. Its for debug purposes only, since there is no
    //      real global namespace URI. Its set by the derived class.
    //
    //  fXMLNamespaceId
    //  fXMLNSNamespaceId
    //      These are the ids of the namespace URIs which are assigned to the
    //      'xml' and 'xmlns' special prefixes. The former is officially
    //      defined but the latter is not, so we just provide one for debug
    //      purposes.
    //
    //  fSchemaNamespaceId
    //      This is the id of the schema namespace URI.
    //
    //  fGrammarResolver
    //      Grammar Pool that stores all the grammars. Key is namespace for
    //      schema and system id for external DTD. When caching a grammar, if
    //      a grammar is already in the pool, it will be replaced with the
    //      new parsed one.
PeiYong Zhang's avatar
PeiYong Zhang committed
    //
    //  fGrammar
    //      Current Grammar used by the Scanner and Validator
    //
    //  fRootGrammar
    //      The grammar where the root element is declared.
    //
PeiYong Zhang's avatar
PeiYong Zhang committed
    //  fGrammarType
    //      Current Grammar Type.  Store this value instead of calling getGrammarType
    //      all the time for faster performance.
    //
    //  fURIStringPool
    //      This is a pool for URIs with unique ids assigned. We use a standard
    //      string pool class.  This pool is going to be shared by all Grammar.
    //      Use only if namespace is turned on.
    //
    //  fRootElemName
    //      No matter we are using DTD or Schema Grammar, if a DOCTYPE exists,
    //      we need to verify the root element name.  So store the rootElement
    //      that is used in the DOCTYPE in the Scanner instead of in the DTDGrammar
    //      where it used to
    //
    //  fExternalSchemaLocation
    //      The list of Namespace/SchemaLocation that was specified externally
    //      using setExternalSchemaLocation.
    //
    //  fExternalNoNamespaceSchemaLocation
    //      The no target namespace XML Schema Location that was specified
    //      externally using setExternalNoNamespaceSchemaLocation.
    //
    //  fSecurityManager
    //      The SecurityManager instance; as and when set by the application.
    //
    //      The number of entity expansions to be permitted while processing this document
    //      Only meaningful when fSecurityManager != 0
    //
    //      The number of general entities expanded so far in this document.
    //      Only meaningful when fSecurityManager != null
    //
    //  fLoadExternalDTD
    //      This flag indicates whether the external DTD be loaded or not
    //
    //  fNormalizeData
    //      This flag indicates whether the parser should perform datatype
    //      normalization that is defined in the schema.
    //
    //  fCalculateSrcOfs
    //      This flag indicates the parser should calculate the source offset.
    //      Turning this on may impact performance.
    //
    //  fStandardUriConformant
    //      This flag controls whether we force conformant URI
Tinny Ng's avatar
Tinny Ng committed
    //
    //  fXMLVersion
    //      Enum to indicate if the main doc is XML 1.1 or XML 1.0 conformant    
    //  fUIntPool
    //      pool of unsigned integers to help with duplicate attribute
    //      detection and filling in default/fixed attributes
    //  fUIntPoolRow