Site Loader

CHAPTERT1
INTRODUCTION
DATATMINING
DataTMiningTisTtheTmethodTofTextractingTtheTmassiveTamountsTofTknowledge.TItTisTtheTusedTforTlocatingTinformationTlikeTpatterns,Tassociations,TanomaliesTandTsignificantTstructuresTfromThugeTamountsTofTdataTkeptTinTdatabase,TinformationTwarehouses,TorTotherTdataTrepositories.TThisTcanTbeTowingTtoTtheTsupplyTofTgiantTamountsTofTknowledgeTinTelectronicTforms,TandTadditionallyTtheTrequirementTforTmodifyingTtheTdataTintoTusefulTinformationTandTdataTforTbroaderTapplicationsTasTwellTasTmarketTanalysis,TbusinessTanalysis,TandTdataTprocessingThasTattractedTaTgoodTdealTofTattentionTinTdataTtrade.

DataTminingThasTbeenTpopularlyTtreatedTasTaTwordTofTinformationTDiscoveryTinTDatabasesT(KDD),TothersTreadTasTaTnecessaryTstepTwithinTtheTprocessTofTinformationTdiscovery.TKnowledgeTdiscoveryTasTaTmethodTconsistsTofTAssociateTinTNursingTunvariedTsequenceTofTtheTsubsequentTsteps:
knowledgeTcleaning(toTtakeTawayTnoiseTorTdigressiveTdata),T
knowledgeTintegration(whereTmultipleTknowledgeTsourcesTisTalsoTcombined)1T,
knowledgeTselection(whereTknowledgeTrelevantTtoTtheTanalysisTtaskTareaTunitTretrievedTfromTtheTdatabase),T
knowledgeTtransformation(whereTknowledgeTareaTunitTremodeledTorTconsolidatedTintoTformsTacceptableTforTminingTbyTactivityToutlineTorTaggregationToperations,TforTinstance)T,
knowledgeTmining(anTessentialTmethodTwhereverTintelligentTwaysTareaTunitTappliedTsoTasTtoTextractTknowledgeTpatterns),T
patternTevaluation(toTdetermineTtheTactuallyTfascinatingTpatternsTrepresentingTdataTsupportedTsomeTpowerfulnessTmeasures;Tand
dataTpresentationT(whereTimageTandTdataTillustrationTtechniquesTareaTunitTwontTtoTpresentTtheTwell-minedTdataTtoTtheTuser).

DataTMiningTTasks
DataTminingTtasksTmayTbeTclassifiedTintoT2TclassesT:TdescriptiveTdataTminingTandTpredictiveTdataTmining.

SummarizationTisTthatTtheTgeneralizationTorTabstractionTofTinformation.TATcollectionTofTrelevantTknowledgeTisTabstractedTandTsummarized,TensuingTaTsmallerTsetTwhichTprovidesTaTgeneralTsummaryTofTinformation.

ClusteringTisTseggregatingTsimilarTteamsTfromTunstructuredTknowledge.TItTisTtheTtaskTofTclusteringTaTcollectionTofTobjectsTinTanTexceedinglyTsuchTthatTobjectTinTsameTgroupTareTuniqueTandTadditionalTlikeToneTanotherTthanTtoTthoseTinTotherTteams.TOnceTtheTclustersTseggregated,TtheTobjectsTareTtaggedTwithTtheirTcorrespondingTclusters,TandTcustomaryToptionsTofTtheTobjectsTinTclusterTwillTbeTsummarizedTtoTmakeTaTcategoryTdescription.

ClassificationTisTlearningTrulesTwhichTwillTbeTappliedTtoTnewTknowledgeTandTcanTusuallyTembodyTfollowingTsteps:TpreprocessingTofTinformation,TplanningTmodeling,TlearningTorTfeatureTchoiceTselectionTandTvalidationT/evaluation.TClassificationTpredictsTcategoricalTcontinuousTvaluedTfunctions.TClassificationTisTthatTtheTderivationTofTmodelTthatTdeterminesTtheTcategoryTofTassociateTdegreeTobjectTsupportedTitsTattributes.TaTcollectionTofTobjectTisTgivenTasTcoachingTsetTduringTwhichTeachTobjectTisTdiagrammaticTbyTvectorTofTattributesTtogetherTwithTitsTcategory.TByTanalyzingTtheTconnectionTbetweenTattributesTandTsophisticationTofTtheTobjectsTwithinTtheTcoachingTset,TclassificationTmodelTmayTbeTmade.

RegressionTisTfindingTperformTwithTlowestTerrorTtoTmodelTknowledge.TIt’sTappliedTforTmathematicsTmethodologyTwhichTisTmostTfrequentlyTusedTforTnumericTprediction.TMultivariateTanalysisTisTwidelyTusedTforTpredictionTandTprediction,TwhereverTitThasTsubstantialToverlapTwithTtheTsphereTofTmachineTlearning.TMultivariateTanalysisTisTadditionallyTwillTnotTperceiveTthatTamongTtheTindependentTvariablesTareaTunitTassociatedTwithTtheTvariable,TandTtoTexploreTtheTstylesTofTtheseTrelationships.

AssociationTisTcravingTforTrelationshipTbetweenTvariablesTorTobjects.TItTaimsTtoTextractTattention-grabbingTassociation,TcorrelationsTorTcasualTstructuresTamongTtheTobjectsTi.e.TtheTlooksTofTanotherTsetTofTobjects.TTheTassociationTrulesTmayTbeThelpfulTforTselling,TgoodsTmanagement,TadvertisingTetc.TAssociationTruleTlearningTmayTbeTaTwidespreadTandTwellTresearchedTtechniqueTforTlocatingTattention-grabbingTrelationsTbetweenTvariablesTinTmassiveTdatabases.

MOTIVATION
DataTminingTisTtheToneTofTtheTwayTofThandlingThugeTinformationTforTminingTcompetitors.TWithThugeTamountTofTunstructuredTreviewTdata,TbothTtheTcompetitorTandTcustomerTfacedTtheTcrucialTchallengeTofTextractingTveryTusefulTinformation.TProjectTisTaboutTtheTrecommenderTsystemTforTbothTtheTcustomerTandTtheTcompetitorTbyTinformationTfilteringTsystemTthatTseeksTtoTpredictTtheTratingTorTreviewsTthatTcustomerTprovides.DatasetTisTcollectedTfromTtheTonline.TItTisTaboutTtheTcustomerTreviewTaboutTtheThotel.TPreprocessingTofTdataTisTinvolvedTwhereTirrelevantTdataTareTremovedTandTwithTtheTprocessedTdataTneedTtoTanalyzeTandTidentifyTtheTtopTk-businessTcompetitorsTofTaTparticularTlocationTofTcity.T
CustomerTfindsTdifficultiesTtoTchooseTtheTbestThotelTtoTvisitTandTenjoy.TCustomerTcanTfindTtheThotelTreviewsTfromTwebTsearchTresult,TbutTthatTdoesn’tTprovideTproperTinformationTandTthatTleadTtoTconfusionTforTtheTcustomerTtoTchooseTtheThotel.TTheTcompetitor’sTin-orderTtoTmakeTtheTbusinessTcompetitorTlevelThigh,TtheyTgetTtheTfeedbackTfromTtheTcustomerTandTthatThelpsTtoTimproveTtheTnegativeTcommentsTaboutTtheThotel.TTheTmotivationTofTtheTprojectTisTinTorderTtoTovercomeTtheTaboveTproblemTandTmakeTcustomerTtoTprovideTaTclearTdecisionTwithTtheTanalysisTofTreviews.Similarly,TtheThotelTcompetitorsTtoTidentifyTstepsTtoTimproveTservice.

OBJECTIVE
ToTidentifyTtheThotelTcompetitorsTbasedTonTtheTcustomerTreviewsTtoTbusiness.

ToTdetermineTtheTimprovementTofThotelTbusiness.

ToTidentifyTtheTfakeTreviewsTbyTunauthorizedTusers.

ToTrecommendTtheTbestThotelTtoTtheTcustomers.

ORGANIZATIONTOFTTHETTHESIS
OrganizationTofTtheTprojectTrepresentsTtheTshortTdescriptionTofTeachTchapter.TChapterT1TprovidesTtheTgeneralTintroductionTtoTdataTmining,TintroductionTtoTtheTprojectTandTdescribesTtheTmotivationTandTobjectiveTofTtheTproject.TChapterT2TisTaboutTtheTLiteratureTSurveyTofTvariousTapproachesTusedTandThowTitTcanTuseTinTidentifyingTtheTbusinessTCompetitorTinTtheTproject.TChapterT3TexplainsTaboutTtheTalgorithmTusedTforTtheTcomponentsTinvolved,TinformationTaboutTtheTtoolTusedTandTtheTdatasetTforTanalysisTpurpose.ChapterT4providesTtheinformationTaboutTtheTimplementationTofTtheTprojectTandTtheTprocessTtoTbeTfollowedTinTorderTtoTachieveTtheTobjectiveTofTprojectTand.TChapterT5TgivesTtheTconclusionTandTfutureTactionTplan.

CHAPTERT2
LITREATURETSURVEY
TMiningTcompetitor’sTofTaTgivenTitem,TtheTmostTinfluencedTfactorTofTtheTitemTwhichTsatisfiesTtheTcustomerTneedTcanTbeTextractedTfromTtheTdataTthatTisTtypicallyTstoredTinTtheTdatabase.TThisTsectionTgivesTtwoTtypesTofTliteraturesTsuchTasTcompetitorTminingTandTunstructuredTdataTmanagement.TTheTunstructuredTdataTsourcesTareTinTaTdifferentTformat,TwhichTisTnotTfallTunderTanyTpredefinedTcategory.TWhenTmanagingTthousandsTofTcustomers,TbusinessTwillThaveTdifficultyTsustainingTtheTrisingTcostsTcreatedTbyTinteractionsTamongTpeople.

2.1TONLINETREVIEWS:
JinTetTalT1,InformationTfromTwebTproducesTtheTcustomerTopinionTinTdifferentTperspective.TEachTcustomerThasTdifferentTopinionsTandTanalysisTofTcompetitorTfromTlargeTwebTinformationTisTdone.TTherefore,ToneTofTtheTbestTcompetitiveTstrategiesTisTtheTsuccessfulTutilizationTofTwebTdataTforTdecisionTsupport.

CustomerTreviewsTforTbusinessTcompetitorTminingTisTcollectedTthroughTseveralTmethods,TwhichTisTusuallyTunstructuredTdataT.MostTofTtheTdataTminingTtechnologiesTcanTonlyThandleTstructuredTdata.TSo,TduringTminingTprocess,TunstructuredTdataTisTnotTtakenTintoTaccountTandTmuchTvaluableTserviceTinformationTisTlost.TStructuredTsystemsTareTthoseTwhereTtheTdataTandTtheTcomputingTactivityTisTpredeterminedTandTwell-defined.TUnstructuredTsystemsTareTthoseTthatThaveTnoTpredeterminedTformTorTstructureTandTareTusuallyTfullTofTtextualTdata.TTypicalTunstructuredTdataTincludeTemail,Treports,Tletters,TandTotherTcommunications.

2.2TANALYSISTOFTCOMPETITORSTINTBUSINESS:
LappasTetTalT2,CompetitiveTminingTisTdoneTonTdifferentTdomainsTinTorderTtoTgetTanTappropriate.SearchingTtheTqueriesTasTperTtheTcustomerTpreferenceTandTrequestingTtheTsearchTengineTforTtheTmatchingTresults.TFinally,TcustomerTgoesTwithTtheTchoiceTofTtheTsearchTengine.TSometimes,TtheTexactTcustomerTpreferenceTisTnotTidentified,TbutTcustomerTgoesTwithTtheTbestTofTsearchTresultsTobtainedTthatTmatchesTfewTofTpreferences.THowever,TthisTtechniqueTfindsTmanyTproblemsTsuchTasTfindingTtheTtop-nTbusinessTcompetitorsTofTanTitemTandTstructuredTdata.

LiTetTalT3,ToaccomplishTminingTcompetitiveTinformationTareTrequiredTsuchTasTaTaboutTtheTcompany,TitsTproductTorTpersonTwhoTworksTinTthatTcompanyTfromTtheTweb.TAnTalgorithmTwasTcalledT”CoMiner”,TalgorithmTextractsTaTsetTofTcomparativeTitemTofTtheTinputTinformationTandTthenTranksTthemTaccordingTtoTtheTtheirTsimilarityTorTidentityTfoundTinTcompartiveness,TandTfinallyTfindsTtheTcompetitiveTitem.TUsuallyTtheTCoMinerTspecificallyTdesignedTforTsupportingTaTparticularTdomain.TTheTdisadvantageTofTCoMinerTisTforTmanyTdomainsTitTwillTbeTdifficultTtoTidentify.

TPantTetTalT4,WebTfootprintTrefersTtoTtheTinformationTfromTonlineTmetricsTforTtopTcompetitorTidentification.TFirm’sTwebTsiteTprovidesTtheTcontentTofTfirm’sTactivities,TproductsTandTserviceTtoTitsTvariousTstakeholders.TThisTisTbasedTonTtheTdata,TfirmTlinksTandTwebsiteTinformationTthatTareTstoredTasTlogTtoTidentifyTtheTpresenceTofTonlineTisomorphism,ThereTtheTCompetitiveTisomorphism,TwhichTisTaTofTcompetingTfirmsTbecomingTsimilarTasTtheyTmimicTeachTotherTunderTcommonTmarketTservices.TPredictiveTmodelsTforTcompetitorTidentificationTbasedTonTonlineTmetricsTareTsupportedTthanTtheTofflineTdata.TTheTtechonolgyTjoinsThandsTwithTtheTonlineTandTofflineTmetricsTtoTboostTtheTdevelopingTperformance.

SocialTmediaTisTconsideredTasTtheTpopularTinformationTexchangeTplatformTsuchTasTTwitterTandTFacebookTthatTareTbeingTincreasinglyTusedTbyTfirmsTtoTcommunicateTwithTvariousTstakeholders.OnlineTNewsTstoriesTavailableTonTtheTwebTfromTaTlargeTnumberTofTnewsTsourcesTthatTmentionTtheTfirm.

ShenghuaTBaoTetTalT5,TAbleTtoTsolveTtheTproblemTofTambiguityTbyTmeansTofTprovidingTtheTinputTentityTwithTadditionalTrestrictions.TCoMinerTisTtheTalgorithmTforTdiscoveringTcompetitors,TtheirTcompetitiveTdomains,TandTdetailedTcompetitiveTevidencesTbyTminingTwebTresources.TCoMinerTextractsTtheTcompetitiveTdomainTinTwhichTtheTgivenTentityTandTitsTcompetitorsTplayTagainstTeachTotherTbyTminingTtheTsalientTphraseTfromTaTsetTofTwebTphrase.

2.3TRATING:
LiTetTal6,TRankingTmethodsTtoTgiveTtheTcompetitorTinTaTrantingTmethod.TDataTfromTlocation-basedTsocialTmediaTareTusedTforTrankingTtheTcompetitor.TTheTuseTofTPage-RankTmodelTandTit’sTvariantTtoTobtainTtheTCompetitiveTRankTofTfirms.T
HoweverTminingTcompetitorsTfromTtheTsocialTmediaTdevelopedTmanyTprivacyTrelatedTissues.TAlso,TsocialTmediaTinformationTareTnotTalwaysTaccurate,TpredictionTofTcompetitorTmayTleadTtoTincorrectTresult.

TaniaTFerreiraTetTalT7,TGatheringTknowledgeTaboutTtheTcustomersTofTe-commerceTplatforms.TAllowTtheTanalysisTofTbehaviors.TFindTpurchasingTpatterns.TDevelopTaTbetterTrelationshipTmanagementTwithTcustomer.TBetterTstockTmanagement.TOptimizingTtheTorganization’sTprocesses.SupportTtoTcreateTmarketingTactions.GreaterTcompetitiveness.BetterTfinancialTperformance.

E-commerceTisTaTconceptTapplicableTtoTanyTtypeTofTbusinessTorTtradeTtransactionTthatTallowsTconsumersTtoTtransactTgoodsTandTservicesTelectronicallyTwithoutTpreventTofTtimeTorTdistance.TAdvantagesTofTe-commerceTare:GreaterTconvenienceTinTpurchasingTtheTproductTorTservice,TNoTstandingTinTqueueTorTbeingTplacedTonTholdTevermore,T24-hourTavailability,TAccessTatTanyTtimeTforTdevicesTwithTanTInternetTconnection,TAccessTtoTstoresTlocatedTremotely,TEasierTtoTcompareTprices,TReduceTemployeeTcosts.TDisadvantagesTofTe-commerceTare:TNeedTforTanTInternetTaccessTdeviceTandTconnection,TInabilityTtoTexperienceTtheTproductTbeforeTpurchase,TVulnerabilityTofTconfidentialTdata,TTechnicalTproblems,TPossibleTdelaysTorTproductTdamageTduringTdelivery.

2.4TINFORMATIONTRETRIEVAL:
MohamedTRedaBouadjenekTetTalT8,TScienceTthatTdealsTwithTtheTrepresentation,Tstorage,TorganizationTof,TandTaccessTtoTinformationTitemsTinTorderTtoTsatisfyTtheTuserTrequirementsTconcerningTtoTthoseTinformation.TToTimproveTtheTclassicTIRTprocessTandTreduceTtheTamountTofTirrelevantTdocuments:TQueryTreformulationT-TwhichTincludesTexpansionTorTreductionTofTtheTquery,TPost-filteringTorTre-rankingTofTtheTretrievedTdocuments,TImprovementTofTtheTIRTmodelT–TtheTwayTdocumentsTandTqueriesTareTrepresentedTandTmatchedTtoTquantifyTtheirTsimilarities.TQueryTreformulationTisTtheTprocessTwhichTconsistsTofTtransformingTanTinitialTqueryTQTtoTanotherTqueryTQ?.TThisTtransformationTmayTbeTeitherTaTreductionTorTanTexpansion.TQueryTReductionTreducesTtheTqueryTsuchTthatTsuperfluousTinformationTisTremoved,TwhileTQueryTExpansionTisTtoTenhanceTtheTqueryTwithTadditionalTinformationTlikelyTtoToccurTinTrelevantTdocuments.

WanTetTalT9,TCompetitivenessTinTtheTcontextTofTproductTdesign.TInitialTstepTisTtheTdefinitionTofTaTdominanceTfunctionTthatTrepresentsTtheTvalueTofTaTproduct.TIdentificationTofTtheTdemandTforTtheTproductTandTprovidingTtheTsameTlevelTinTtheTentireTdomain.TheTgoalTisTthenTtoTuseTtheTfunctionTtoTcreateTitemsTthatTareTnotTdominatedTbyTother,TorTmaximizeTitemsTwithTtheTmaximumTpossibleTdominanceTvalue.TSimilarly,TitTrepresentsTitemsTasTpointsTinTaTmultidimensionalTspaceTandTlooksTforTsubspacesTwhereTtheTappealTofTtheTitemTisTmaximized.

2.5TOPINIONTMINING:
Marrese-TayloretTalT10,TOverallTopinionTpolarityTisTcalculatedTandTclassifiedTasTpositiveTorTnegative.TInTsentenceTlevel,TeachTsentenceTinTtheTdocumentTisTanalyzedTandTdeterminesTtheTopinionTexpressedTinTaTsentenceTasTpositive,Tnegative,TorTneutral.TInTopinionTmining,TtheTtermTaspectTmeansTimportantTfeaturesTofTproductsTratedTbyTcustomersT(ForTexample,TinTcaseTofTrestaurantTfood,Tservice,TcleanlinessTetc.).TTheTproductTandTrestaurantTreviewsTareTaTmixtureTofTpositiveTandTnegativeTopinionTaboutTdifferentTaspects.TItTneedsTmoreTfine-grainedTanalysisTofTreviewsTtoTmineTtheseTmixedTopinions,TaspectTlevelTperformTthisTtask.THenceTaspectTbasedTopinionTminingTisTpreferredTinTthisTwork.TTheTcoreTtasksTinTaspectTbasedTopinionTminingTisTaspectTidentification,TaspectTbasedTopinionTwordTidentificationTandTitsTorientationTdetection.

VlachouTetTalT11,TTop-kTqueriesTareTwidelyTappliedTforTretrievingTtheTkTmostTinterestingTobjectsTbasedTonTtheTindividualTuserTpreferences.TClearly,TanTobjectT(product)TthatTisThighlyTrankedTbyTmanyTusersT(customers)ThasTobviouslyTaTwiderTvisibilityTandTimpactTinTtheTmarket.TThus,TanTintuitiveTdefinitionTofTtheTinfluenceTofTaTproductTinTtheTmarketTisTtheTnumberTofTcustomersTthatTconsiderTitTappealingT(theTproductTbelongsTtoTtheirTtop-kTresults)TbasedTonTtheirTpreferences.TIdentifyingTtheTmostTinfluentialTobjectsTfromTaTgivenTdatabaseTofTproductsTisTimportantTforTmarketTanalysisTandTdecision-makingTandTisTbeneficialTforTseveralTreal-lifeTapplications.

AnaTValdiviaTetTalT11,TSentimentTclassification,TtheTbest-knownTsentimentTanalysisTtask,TaimsTtoTdetectTsentimentsTwithinTaTdocument,TaTsentence,TorTanTaspect.TThisTtaskTcanTbeTdividedTintoTthreeTsteps:TpolarityTdetectionT(labelTtheTsentimentTofTtheTtextTasTpositive,Tnegative,TorTneutral),TaspectTselection/extractionT(obtainTtheTfeaturesTforTstructuringTtheTtext),TclassificationT(applyTmachineTlearningTorTlexiconTapproachesTtoTclassifyTtheTtext).TTheTdetectionTofTironicTexpressionsTinTTripAdvisorTreviewsTisTanTopenTproblemTthatTcouldThelpTtoTextractTmoreTvaluableTinformation.TNeedTnewTapproachesTtoTfixTtheTpositive,TnegativeTandTneutralityTviaTconsensusTamongTSAMs.

FarmanTAliTetTalT12,TMergedTontologyTandTSVMTbasedTrecommendationTandTinformationTextractionTsystedTautomatesTtheTextractionTofTpreciseTdataTfromTtheTInternetTandTsuggestsTaccurateTitemsTforTdisabledTusers.TATnumberTofTresonableTissuesTareTeffectivelyTconsidered.TOralTquestionsTconversionTintoTtheTrightTformatTforTaTkeywordTbasedTmostlyTcomputerTprogram.TItTcategorisesTtheTretrievedTinformationaTandTeffectivelyTcomputesTtheTtheTpolarityTforTtheTdesiredTitemsTthatTneedTtoTbeTrecommended.

CHAPTERT3
SYSTEMTDESIGN
PROPOSEDTSYSTEMTARCHITECTURE
T

FigT3.1TProposedTSystemArchitecture
DuringTminingTprocess,TunstructuredTdataTisTnotTtakenTintoTaccountTandTmuchTvaluableTserviceTinformationTisTlost.TStructuredTsystemsTareTthoseTwhereTtheTdataTisTpredeterminedTandTwell-defined.TUsuallyTCustomerTreviewsTareTofTunstructuredTdata,TwhereTweTneedTtoTconvertTtoTstructuredTdataTandTthenTstartTusingTtheTmodifiedTdataTforTfurtherTprocess.

TCOMPONENTS
DataTCollection
Database
CustomerTReviews

Fig3.2TDataTCollection
TDataTcollectionTisTcarriedToutTbyTtheTcustomersTofTtheThotel.TCustomerTusedTtoTprovideTtheTreviews.TReviewsTobtainedTforTtheThotelTmayTdifferTfromTcustomerTtoTcustomer;TitTisTsolelyTbasedTonTtheTcustomer’sTpreferenceTandTperspective.TReviewsThelpTbothTtheTcustomerTandTtheTcompetitorTtoTidentifyTtheTadvantagesTandTdisadvantagesTofTaTspecificThotel.TCustomerTreviewsTaboutTtheTHotelTareTstoredTinTtheTdatabaseTofTtheThotel.TReviewsTstoredTinTtheTdatabaseTareTinTtheTformTofTunstructuredTdata.

DataTPreprocessing
TDataTpreprocessingTinvolvesTtransformationTofTanTunstructuredTdataTintoTaTstructuredTdata.TCustomerTreviewTisTanTunstructuredTdata,TitTisTincomplete,TproperTinformationTwillTbeTmissedTandTmayTcontainTmanyTerrors.TDataTpreprocessingTisTaTprovenTmethodTofTresolvingTsuchTissues.TDataTpreprocessingTusesTtheTNLPTsuchTasTTokenization,TremovalTofTirrelevantTdataTandTstemming.TItTpreparesTtheTdataTforTfurtherTprocessing.

AlgorithmTforTDataTPreprocessing:
ProcessTtheTreviewsToneTbyToneTbasedTonTeachThotel
RemoveTtheTwhitespaceTorTextraspaceTfromTtheTreview.

RemoveTtheTnewTlineTspaceTfromTtheTreview.

RemoveTtheTemoticonTandTsmileyTfromTtheTreview.

BreakTtheTsentenceTintoTpartitions.

TheTproblemTwithTtheTSmall-SpaceTisTthatTtheTnumberTofTsubsetsT{displaystyle ell }TthatTweTpartitionTSTintoTisTlimited,TsinceTitThasTtoTstoreTinTmemoryTtheTintermediateTmedians.

TSo,TifTMTisTtheTsizeTofTmemory,TneedTtoTpartitionTSTintoT{displaystyle ell }TsubsetsTsuchTthatTeachTsubsetTfitsTinTmemory,T(n/{displaystyle ell })TandTsoTthatTtheTweightedT{displaystyle ell }kTcentersTalsoTfitTinTmemory,T{displaystyle ell }k;M.T
IdentificationTofTFaultTReview:
TFaultTreviewTcanTbeTprovidedTonlyTbyTunauthorizedTusers.TCustomerTcanTprovideTreviewsTonlyTwithTtheTBillTnumber,TthisThelpsTtoTreduceTtheTfaultTreviews.TReviewsTbyTPayableTagentTcanTalsoTbeTreduced,TbecauseTtheThotelTownersTwillTnotTprovideTtheTbillTinformationTtoTtheTagents.TFakeTbillTwillTnotTbeTgeneratedTinTanyTHotel.

FeatureTSelectionTandTExtraction
TFeatureTSelectionTrefersTtoTselectingTtheTmostTrelevantTattributesTandTFeatureTextractionTisTcombiningTattributesTintoTaTnewTreducedTsetTofTfeatures.

AlgorithmTforTFeatureTSelection:TMinimumTDescriptionTLengthTisTanTinformationTtheoreticTmodelTselectionTprinciple.TItTassumesTthatTtheTsimplest,TmostTcompactTrepresentationTofTdataTisTtheTbestTandTmostTprobableTexplanationTofTtheTdata.TItTconsidersTeachTattributeTasTaTsimpleTpredictiveTmodelTofTtheTtargetTclass.

AlgorithmTforTFeatureTExtraction:TNon-negativeTMatrixTFactorizationTisTaTstateTofTtheTartTfeatureTextractionTalgorithm.TNMFTisTusefulTwhenTthereTareTmanyTattributesTandTtheTattributesTareTambiguousTorThaveTweakTpredictability.TNMFTproducesTmeaningfulTpatterns.T
PseudoTcodeTforTNaiveTBayes:
StepT1:TConvertTtheTdataTsetTintoTaTfrequencyTtable
StepT2:TCreateTLikelihoodTtable.

StepT3:TNaiveTBayesianTequationTisTusedTtoTcalculateTtheTposteriorTprobabilityTforTeach.T
TheTclassTwithTtheThighestTposteriorTprobabilityTisTtheToutcomeTofTprediction.

PrT(CategoryT|TWord)T=TPr(TWord|CategoryT).Pr(Category)T/TPr(TWord)T
/*HereTCategoryTrepresentTPositive,TnegativeTorTadvice
andTWordTrepresentTGood,TbadTorTimprove*/
/*NaiveTBayesTusedTtoTpredictTtheTprobabilityTofTdifferentTclassTbasedTonTvariousTattributes.TThisTalgorithmTisTmostlyTusedTinTtextTclassificationTandTwithTproblemsThavingTmultipleTclasses.*/
PseudoTcodeTforTSentimentalTAnalysis:
ForTeachTwordTinTtheTreviewT
ifTtheTwordTisTinTtheTNegationListTNegNumT=T-1T
elseTifTtheTwordTisTinTtheTAdvList:TadvNumT=T1T
elseTifTtheTwordTisTinTtheTMainWordList:T
ifTtheTWordTvalueTisTPositive:T
PosCountT=TPosCountT*negNum+advNumT
else:T
NegCountT=TNegCountT*negNum-advNumT
AddTasTPositiveTreview
TifTposCount+negCountT;Taccuracy:
TpositiveT++T
AddTasTNegativeTreview
TifTposCount+negCountT;T-accuracy:T
negativeT++
3.2.4TVisualization
TVisualizationTrepresentsTtheTdiagrammaticTorTstatisticalTrepresentationTofTtheTprocessedTdataTasTanTOutput.TBarTchartTrepresentationTisTused,TwhereTx-axisTrepresentsTtheTHotelsTandTy-axisTrepresentsTtheTlevelTofTrating.

CHAPTERT4
IMPLEMENTATIONTANDTRESULTS
ImplementationTisTdoneTbasedTinTJavaTlanguage.TTheTalgorithmsTcanTeitherTbeTappliedTdirectlyTtoTaTdatasetTorTcalledTfromTJavaTcode.

DataTcollectionTprocessTisTimplementedTbyTgettingTtheTreviewsTfromTtheTcustomer.TCustomerTcanTchooseTtheThotelTnameTandTcanTprovideTtheTreview.TReviewsTprovidedTbyTtheTcustomerTisTbasedTonTtheTopinionTofTself.TReviewsTofTaTsameThotelTmayTdifferTfromThotelTtoThotel.TCustomerTreviewsTwillTbeTstoredTinTtheTdatabase.TInformationTstoredTinTtheTdatabaseTofTaThotelTisTinTtheTunstructuredTformat.

HotelTName Rating Review
HotelTRussoTPalace 4 GoodTlocationTawayTfromTtheTcrouds
HotelTRussoTPalace 5 GreatThotelTwithTJacuzziTbath!
LittleTParadiseTHotel 5 PureTdelight!
LittleTParadiseTHotel 4 NiceTHotel
FairfieldTInnTByTMarriottTBinghamton 4 GoodTplaceTtoTvisit
FairfieldTInnTByTMarriottTBinghamton 4 OverallTgood
FairfieldTInnTByTMarriottTBinghamton 3 Disappointed
FairfieldTInnTByTMarriottTBinghamton 4 Enjoyable
FairfieldTInnTByTMarriottTBinghamton 4 GreatThotelT-TgreatTlocation
FairfieldTInnTByTMarriottTBinghamton 5 GoodThotel.

DaysTInnTElTRenoTOk 1 DecentTPlace
DaysTInnTElTRenoTOk 3 Noisy
DaysTInnTElTRenoTOk 4 NiceThotel
DaysTInnTElTRenoTOk 2 Disgusting
DaysTInnTElTRenoTOk 2 OldTsmellyTrooms
TableT4.1THotelTReviewTDataset
DatasetsTareTalsoTdownloadedTfromTonline15,16,17.THotelTDatasetTSizeTisT1TGB.TTheTdownloadTdatasetsTalsoTwillTbeTinTunstructuredTdataTformat.TTheseTdatasetsTareTneedTtoTbeTmodifiedTtoTstructuredTformatTbyTpreprocessingTtheTdata.

DataTPre-processingTinvolvesTtheTcorrectionTofTinformationTfromTtheTdatasetsTandTmakingTtheTdataTcompleteTandTinTtheTstructuredTformat.TPre-processingTisTdoneTwithTtheThelpTofTWekaTtool,TwhereThotelTdatasetTfileTwillTbeTpre-processed.TToTpreprocessTinTwekaTtool,TtheTdatasetTshouldTbeTinT.arffTfileTformat.TWekaTpreprocessToperationTsupportsTonlyTfileTformatTofTARFF.T
FirstTstepTisTtoTconvertTtheTfileTfromT.csvTtoT.arffTformat.ThenTloadTtheThotelTdatasetTwhichTisTinT.arffTformat.TItTwillTlistTtheTnumberTofTattributesTinTtheTloadedTdatasetTfile.TTheTemptyTorTnullTvalueTinTtheTdatasetTareTeliminatedTorTfiltered.TFinally,TtheTprocessedTdatasetTwillTbeTobtainedTwhichTwillTbeTusedTforTfurtherTprocessing.

InTtheTfirstTphase,TtheTdataTisTcollectedTandTtheTcollectedTdataTisTpreprocessed.TWithTtheTpreprocessedTdata,TableTtoTidentifyTtheTreviewsTwhetherTitTisTpositiveTreview,TnegativeTreviewTorTadviceTreviewTtoTcustomerTorThotelTowner(competitor).

TInTtheTsecondTphase,TfeatureTextractionTwillTbeTdoneTandTtheTresultTwillTbeTshownTinTtheTformTofTchartTthatThelpsTtheTcustomerTtoTchooseTtheTbestThotelTandTalsoTtheTcompetitorTtoTefficientlyTmakeTthemTstableTinTtheTsocietyTwithTtheTstrengthTandTservicesTtoTimproveTinTfuture.

TSCREENSHOT
HomePageT:

ReviewsTaboutTeachThotel:

HolidayInnTReviewTPageT:

TajTHotelTReviewTPage:

CustomerTreviewTpage:T

CustomerToverTallTrating:

CHAPTERT5
CONCLUSIONTANDTFUTURETWORK
5.1TCONCLUSION
BestThotelsTareTrecommendedTforTtheTbusinessTcompetitorsTandTcustomer.TNaiveTBayesTalgorithmTwasTusedTtoTidentifyTtheTcompetitorsTofTselectedThotels.TItTsupportsTtoTimproveTtheTbusinessTandTalsoTprovidingTappropriateTcompetitorsTofTtheTbusinessTtoTtheTcustomerTneed.TTheTproposedTworkThelpsTtheTcompetitorTtoTfindTtheTwayTforTbuildingTtheTbusinessTandTcustomerTtoTchooseTtheTbestThotelTthatTsatisfiesTtheTneed.

T
5.2TFUTURETWORK
ForTtheTfutureTenhancement,TfeaturesTandTprocessTneedTtoTbeTconsideredTinTtheTalgorithmsTforTtheTbetterTresults.TAlgorithmTneedsTtoTbeTmodifiedTeffectivelyTinTorderTtoTmakeTtheTresultTusefulTtoTotherTcustomersTtoTidentifyTtheTtopThotelsTevenTmoreTinTaTbetterTwayTinTthecity.

Post Author: admin