Wednesday, December 16, 2015

Data Mining test answers of 2016.

Find Complete and recently updated Correct Question and answers of Data Mining. All Answers updated regularly with new questions. Upwork Data Mining test answers of 2016.



Question:* What is CRISP-DM?

Answer: • A cross-industry standard process for data mining

Question:* Which of the following is valid XML?

Answer: • All are valid

Question:* Which of these is an example of a sequential pattern relationship?

Answer: • Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes

Question:* Sharding refers to:

Answer: • partioning a database for distribution across different servers

Question:* Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?

Answer: • Dijkstra's algorithm

Question:* What is a genetic algorithm?

Answer: • A search algorithm that enables us to locate optimal binary string by processing an initial random population of binary strings by performing operations such as artificial mutation, crossover and selection.

Question:* Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?

Answer: • //a/[contains(@href, "profile")]

Question:* Which of the following is not valid JSON?

Answer: • {["answer": "this one"]}

Question:* Which industry can benefit from data mining?

Answer: • All of these

Question:* In predictive models, the values or classes to be predicted are called the:

Answer: • All of these

Question:* Data items grouped into relationships and preferences are known as:

Answer: • Clusters

Question:* True or False? Economic indicators are external data factors.

Answer: • True

Question:* What is a KDD Process?

Answer: • Knowledge Discovery in Databases

Question:* Which of the following disciplines overlaps Data Mining?

Answer: • All of the above

Question:* Which are popular data mining methods?

Answer: • All of these

Question:* Which of these are NOT types of analytical software:

Answer: • All are valid types

Question:* What is data visualization?

Answer: • The visual interpretation of complex relationships in multidimensional data

Question:* Which of the following is not a relational database?

Answer: • All of the above

Question:* Decision trees are able to handle missing values without using any impute transformation. True or False?

Answer: • True

Question:* Which of the following is valid XML?

Answer: • All are valid

Question:* A(n) _____ algorithm creates rules that describe how often events have occurred together.

Answer: • associative

Question:* Changes to parts of a code could lead to the problem of ______________ data.

Answer: • inconsistent

Question:* What are decision trees?

Answer: • Structures that generate rules for the classification of a dataset

Question:* The annual revenue of an international company is correlated with other attributes like advertisement, exchange rate, inflation rate etc. Having these values (or their reliable estimations for the next year) the company have to calculate its expected revenue for the next year. Choose the appropriate data mining task for this business problem.

Answer: • Regression

Question:* You are a credit risk manager of a retail bank. Some information about customers are available to analytics. Based on this data you have to decide that a person will be a good or bad customer. Choose the appropriate data mining task for this business problems.

Answer: • Classification

Question:* What is CRISP-DM?

Answer: • A cross-industry standard process for data mining

Question:* In a neural net, to what does topology refer?

Answer: • The number of layers and the number of nodes in each layer

Question:* What is the measure of how much two random variables change together?

Answer: • covariance

Question:* Which of the following clustering algorithms can find clusters of arbitrary shape?

Answer: • Both of these

Question:* A function used by a node in a neural net to transform input data from any domain of values into a finite range of values is known as a(n):

Answer: • Activation Function

Question:* True of False? Loose coupling data mining architecture is mainly for memory-based data mining systems that does not require high scalability and high performance.

Answer: • True

Question:* Data not collected by the organization, such as data from a proprietary database, that is combined with the organization’s own data is known as:

Answer: • Overlay

Question:* With which of these layers does a neural network start?

Answer: • Input layer

Question:* Suppose that the company's marketing department collects data from customers. Make customer groups to ensure that the most appropriate group to target the different offers. Choose the appropriate data mining task for this business problem.

Answer: • Segmentation

Question:* What is the front end layer of data mining architecture?

Answer: • An intuitive and user friendly user interface

Question:* To increase the confidence of your state of classification performance on the entire population, you should:

Answer: • Increase the size of the test dataset

Question:* Which data mining technique organizes sets of data into predefined groups?

Answer: • Classification

Question:* In the association between two variables, what is the difference between the antecedent and the consequent?

Answer: • The antecedent is on the left, the consequent on the right

Question:* A hyperplane is a

Answer: • decision boundary separating classes of data

Question:* Which of these are NOT considered internal data factors?

Answer: • Economic downturns

Question:* The level of the model that specifies (often graphically) which variables are locally dependent on each other.

Answer: • Structural Level

Question:* The algorithm powering the Google search engine is:

Answer: • PageRank

Question:* Which of these is NOT a common descriptions of layers?

Answer: • Functional

Question:* Support Vector Machines have an advantage over Neural Networks because SVM's are

Answer: • more resistent to local minima convergence

Question:* Which of these is an example of a sequential pattern relationship?

Answer: • Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes

Question:* What is Change and Deviation Detection?

Answer: • A task focusing on discovering the most significant changes in the data from previously measured or normative values

Question:* In the analysis of time-series data, the mean value over a given time period (usually some interval in the past up to the present) is called a(n)

Answer: • moving average

Question:* Sharding refers to:

Answer: • partioning a database for distribution across different servers

Question:* What is Dependency Modeling?

Answer: • The process of finding a model which describes significant dependencies between variables

Question:* What is Regression?

Answer: • Learning a function that maps a data item to a real-valued prediction variable.

Question:* Which of the following storage solutions is most appropriate for a semi-structured dataset whose members do not all have the same attributes?

Answer: • MongoDB

Question:* In order to estimate classification performance on an entire population, you need _______

Answer: • disjoint training and test datasets

Question:* What is the type of data mining that drives the Amazon.com recommendation system?

Answer: • Association Learning

Question:* Which of the following algorithms is generally suitable for unsupervised learning tasks?

Answer: • k-means algorithm

Question:* True or False? Tests in CART are always Binary.

Answer: • True

Question:* Which of these are evolutionary computational methods?

Answer: • Genetic algorithms

Question:* Generalization error is a consequence of

Answer: • Overfit

Question:* A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset is:

Answer: • Nearest Neighbor

Question:* What is the extraction of useful if-then rules from data based on statistical significance?

Answer: • Rule Induction

Question:* What is a genetic algorithm?

Answer: • A search algorithm that enables us to locate optimal binary string by processing an initial random population of binary strings by performing operations such as artificial mutation, crossover and selection.

Question:* In the MapReduce model, Map and Reduce functions act directly on which kind of data structure?

Answer: • key-value pair

Question:* What is Interestingness?

Answer: • An overall measure of pattern value, combining validity, novelty, usefulness, and simplicity.

Question:* Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?

Answer: • Dijkstra's algorithm

Question:* True or False? The MARS algorithm cannot produce rules.

Answer: • True

Question:* In which type of analysis is a Kohonen feature map typically employed?

Answer: • Cluster analysis

Question:* What is Classification?

Answer: • Learning a function that maps a data item into one of several predefined groups.

Question:* Which of the following is NOT a common source system?

Answer: • Node

Question:* A DBMS reduces data redundancy and inconsistency by

Answer: • Enforcing referential integrity

Question:* Which of the followng clustering algorithms can optimize an ojbective function?

Answer: • k-means and CLARANS

Question:* Which of the following is not a common goal of the KDD Process:

Answer: • Performance

Question:* What is Clustering?

Answer: • A descriptive task where one seeks to identify a finite set of categories to describe the data.

Question:* Which of the following is NOT a function of data warehouses?

Answer: • Cleaning dirty data

Question:* In Natural Language Processing, what is the role of a lexical analyzer?

Answer: • splits the stream of input characters into tokens

Question:* Which of the following properties is a constraint on a RESTful application?

Answer: • stateless

Question:* What is Summarization?

Answer: • Methods for finding a compact description for a subset of data.

Question:* Which of the following is NOT a method of combining multiple models into an ensemble model?

Answer: • Bootstrapping

Question:* The component of the Hadoop Distributed Filesystem responsible for storing metadata is called the

Answer: • Namenode

Question:* Converted information to provide insights about historical patterns and future trends is known as:

Answer: • Knowledge

Question:* Which of the following properties applies to Single-Layer Perceptrons?

Answer: • random initalization of weights

Question:* Which of the following applications are usually used to classify students' performances?

Answer: • If...then... analysis

Question:* The authentication protocol used by many significant web APIs is called:

Answer: • OAuth

Question:* In any numerical data set with a meaningful mean value, what is the minimum fraction of data that will fall within n standard deviations of the mean?

Answer: • 1-1/n^2

Question:* What is CURL?

Answer: • A command-line tool for retrieving files

Question:* Which of these is a possible architecture of a data mining system?

Answer: • No-coupling

Question:* Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?

Answer: • //a/[contains(@href, "profile")]

Question:* What is the first step in the business understanding phase?

Answer: • Firmly grasp business objectives and needs

Question:* Taking multiple random samples of data and building a classification model for each is known as:

Answer: • Boosting

Question:* What is Pig

Answer: • A programming language that simplifies the common tasks of working with Hadoop.

Question:* A commonly used continuous alternative to the step function in multi-layered neural network output is the

Answer: • logistic function

Question:* Which of the following algorithms produces decision trees?

Answer: • ID3

Question:* Which of these is not a step in the KDD process?

Answer: • Data Quantification

Question:* "In 2% of the purchases at the hardware store, both a pick and a shovel were bought,” is an example of:

Answer: • Support

Question:* Apriori is a seminal algorithm for finding frequent item sets using:

Answer: • Candidate generation

Question:* If more than one value occurs the same number of times, the data is:

Answer: • Multi-modal

Question:* The level of the model that specifies the strengths of the dependencies using some numerical scale.

Answer: • Quantitative Level

Question:* Which of the following method can be used for modeling a categorical target variable?

Answer: • Logistic Regression

Question:* Which of the following is not a primary phase of a Hadoop Reducer?

Answer: • Map

Question:* The measured differences between a model and its predictions are known as:

Answer: • Noise

Question:* Which decision tree method performs multi-level splits when computing classification trees?

Answer: • CHAID (Chi Square Automatic Interaction Detection)

Question:* True or False? Artificial neural networks are linear predictive models.

Answer: • False

Question:* Which of the following is not an appropriate tool for harvesting data from a website that accesses its database through Javascript/AJAX calls?

Answer: • wget

Question:* What is the advantage of the k-Medoids Clustering Algorithm over the k-Means Clustering (Lloyd's) Algorithm?

Answer: • more resistant to outliers

Question:* Which of the following is not valid JSON?

Answer: • {["answer": "this one"]}

Question:* Which of the following is part of a retail customer data mining strategy?

Answer: • loyalty cards

Question:* The two major functions of BI servers are:

Answer: • Management and delivery

Question:* How do you measure interestingness in association patterns?

Answer: • measure lift

Question:* Where can a website operator generally find data on her customers' IP addresses?

Answer: • server logfiles

Question:* Hash based technique, Transaction Reduction, Portioning, Sampling, and Dynamic Item Counting are all examples of what?

Answer: • Techniques to improve the efficiency of an Apriori algorithm

Question:* Data mining provides a link between:

Answer: • Separate transactional and analytical systems

Question:* A descriptive approach to exploring data that can help identify relationships among values in a database is:

Answer: • Link analysis

Question:* What is Hive

Answer: • Hive enables Hadoop to operate as a data warehouse.

Question:* What is the purpose of the Hadoop Distributed File System (HDFS)?

Answer: • To enable computation to take place by allowing each server to have access to the data.

Question:* The silhouette coefficient can be used to determine the natural number of clusters for ________.

Answer: • Partitioning Algorithms



No comments:

HTML5 Upwork (oDesk) TEST ANSWERS 2022

HTML5 Upwork (oDesk) TEST ANSWERS 2022 Question: Which of the following is the best method to detect HTML5 Canvas support in web br...

Disqus for upwork test answers