Online Test-Exam Upwork (oDesk) and Freelancer: Data Mining 2015

1. What is CRISP-DM?

Answers:

• Microsoft's linear regression algorithm

• A six phase method for predicting e-commerce buying habits

• A decision tree developed in the 1980's but almost entirely replaced by the CART method today

• A cross-industry standard process for data mining

2. Which of the following is valid XML?

Answers:

• <valid>This One</valid>

• All are valid

• <body answer="valid">This One</body>

• <valid>"This One"</valid>

3. Which of these is an example of a sequential pattern relationship?

Answers:

• Placing two frequently purchased items next to each other on the shelf

• Reorganizing your basketball team's starting lineup based on an analysis of performance

• Using business experience and gut instinct to design a new floorplan in a grocery store

• Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes

4. Sharding refers to:

Answers:

• none of the above

• simultaneously accessing multiple object databases over SSH

• partioning a database for distribution across different servers

• a measure of the noise in a database's contents

5. Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?

Answers:

• Neural Networks

• k-means algorithm

• Dijkstra's algorithm

• Markov chains

6. What is a genetic algorithm?

Answers:

• An algorithm that estimates how well a particular pattern (a model and its parameters) meet the criteria of the KDD process. Evaluation of predictive accuracy (validity) is based on cross validation. Evaluation of descriptive quality involves predictive a

• A classic algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item s

• A search algorithm that enables us to locate optimal binary string by processing an initial random population of binary strings by performing operations such as artificial mutation, crossover and selection.

7. Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?

Answers:

• //a/[contains(@href, "profile")]/@href

• //a/[contains(@href, "profile")]

• //href/profile

• //a/profile

8. Which of the following is not valid JSON?

Answers:

• {["answer": "this one"]}

• {"answer": ["this one"]}

• All are valid

• {"answer": "this one"}

9. Which industry can benefit from data mining?

Answers:

• Manufacturing

• Retail

• Finance/Banking

• All of these

10. In predictive models, the values or classes to be predicted are called the:

Answers:

• All of these

• Response

• Target variables

• Dependent

11. Data items grouped into relationships and preferences are known as:

Answers:

• Clusters

• Predictable Sets

• Punctional Organizations

• Degrees of Fit

12. True or False? Economic indicators are external data factors.

Answers:

• True

• False

13. What is a KDD Process?

Answers:

• Knoop-hardness measured through high-impact dimension

• Differential Decryption

• K-mean Data Discovery

• Knowledge Discovery in Databases

14. Which of the following disciplines overlaps Data Mining?

Answers:

• All of the above

• Statistics

• Artificial Intelligence

• Linguistics

15. Which are popular data mining methods?

Answers:

• Probabilistic Graphical Dependency Models

• Relational Learning Models

• All of these

• Decision Trees and Rules

16. Which of these are NOT types of analytical software:

Answers:

• Machine learning

• Neural network

• Statistical

• All are valid types

17. What is data visualization?

Answers:

• A structured and developed prediction of data results

• The visual interpretation of complex relationships in multidimensional data

• The technical term for the act of data being stored in a server

18. Which of the following is not a relational database?

Answers:

• Google Big Table

• MongoDB

• Apache Cassandra

• All of the above

19. Decision trees are able to handle missing values without using any impute transformation. True or False?

Answers:

• False

• True

20. Which of the following is valid XML?

Answers:

• <valid>This One</valid>

• <valid>"This One"</valid>

• <body answer="valid">This One</body>

• All are valid

21. A(n) _____ algorithm creates rules that describe how often events have occurred together.

Answers:

• associative

• pruning

• CHAID

• artificial

22. Changes to parts of a code could lead to the problem of ______________ data.

Answers:

• inconsistent

• dirty

• granular

• nonintegrated

23. What are decision trees?

Answers:

• Structures that generate rules for the classification of a dataset

• Hierarchical dimensions that can be created with a hyper cube browser

• Data not collected by the organization, such as data available from a reference book

• Complex reports generated by a qualified data scientist

24. The annual revenue of an international company is correlated with other attributes like advertisement, exchange rate, inﬂation rate etc. Having these values (or their reliable estimations for the next year) the company have to calculate its expected revenue for the next year. Choose the appropriate data mining task for this business problem.

Answers:

• Segmentation

• Regression

• Classification

25. You are a credit risk manager of a retail bank. Some information about customers are available to analytics. Based on this data you have to decide that a person will be a good or bad customer. Choose the appropriate data mining task for this business problems.

Answers:

• Regression

• Segmentation

• Classification

26. What is CRISP-DM?

Answers:

• A cross-industry standard process for data mining

• A six phase method for predicting e-commerce buying habits

• Microsoft's linear regression algorithm

• A decision tree developed in the 1980's but almost entirely replaced by the CART method today

27. In a neural net, to what does topology refer?

Answers:

• The number of layers and the number of nodes in each layer

• The graphical visualization of the data

• The number of nodes utilized

• The range of variables in a set

28. What is the measure of how much two random variables change together?

Answers:

• stochastic inertia

• covariance

• polyconvergence

• binary standard deviation

29. Which of the following clustering algorithms can find clusters of arbitrary shape?

Answers:

• Single-Link

• DSBSCAN

• None of these

• Both of these

30. A function used by a node in a neural net to transform input data from any domain of values into a finite range of values is known as a(n):

Answers:

• Activation Function

• Chi-square

• Confusion matrix

• Antecedent

31. True of False? Loose coupling data mining architecture is mainly for memory-based data mining systems that does not require high scalability and high performance.

Answers:

• False

• True

32. Data not collected by the organization, such as data from a proprietary database, that is combined with the organization’s own data is known as:

Answers:

• Non-applicable date

• Noise

• Overlay

• Overfitting

33. With which of these layers does a neural network start?

Answers:

• Input layer

• Hidden Layer

• Output Layer

• Transparent layer

34. Suppose that the company's marketing department collects data from customers. Make customer groups to ensure that the most appropriate group to target the different offers. Choose the appropriate data mining task for this business problem.

Answers:

• Segmentation

• Classification

• Regression

35. What is the front end layer of data mining architecture?

Answers:

• An intuitive and user friendly user interface

• The team of programmers who designed the software utilized in a particular mining project

• The hardware designed specifically for storage of massive amounts of data

• Firewalls established to protect data from malicious sources

36. To increase the confidence of your state of classification performance on the entire population, you should:

Answers:

• Increase the size of the training dataset

• Decrease the size of the test dataset

• Increase the size of the test dataset

• Decrease the size of the training dataset

37. Which data mining technique organizes sets of data into predefined groups?

Answers:

• Sequential Patterning

• Clustering

• Classification

• Gamification

38. In the association between two variables, what is the difference between the antecedent and the consequent?

Answers:

• The antecedent is on the left, the consequent on the right

• Nothing, they are interchangeable

• The antecedent is always a very complex variable

• The antecedent is on the right, the consequent is on the left.

39. A hyperplane is a

Answers:

• non-terminating error condition

• variant of the C4.5 algorithm

• collection of linked hypertext files

• decision boundary separating classes of data

40. Which of these are NOT considered internal data factors?

Answers:

• Staff Skills

• Economic downturns

• Product Positioning

• Price

41. The level of the model that specifies (often graphically) which variables are locally dependent on each other.

Answers:

• Structural Level

• Quantitative Level

• Qualitative Level

• Primary Level

42. The algorithm powering the Google search engine is:

Answers:

• AdaBoost

• The Brin-Page Method

• PageRank

• GoogleCrawler

43. Which of these is NOT a common descriptions of layers?

Answers:

• Functional

• Input

• Hidden

• Output

44. Support Vector Machines have an advantage over Neural Networks because SVM's are

Answers:

• more resistent to local minima convergence

• parametric

• none of the above

• easier to train via online learning

45. Which of these is an example of a sequential pattern relationship?

Answers:

• Using business experience and gut instinct to design a new floorplan in a grocery store

• Reorganizing your basketball team's starting lineup based on an analysis of performance

• Placing two frequently purchased items next to each other on the shelf

• Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes

46. What is Change and Deviation Detection?

Answers:

• The process of finding a model which describes significant dependencies between variables

• Methods for finding a compact description for a subset of data.

• A task which consists of techniques for estimating, from data, the joint multi-variate probability density function of all of the variables/fields in the database.

• A task focusing on discovering the most significant changes in the data from previously measured or normative values

47. In the analysis of time-series data, the mean value over a given time period (usually some interval in the past up to the present) is called a(n)

Answers:

• unbiased mean

• partial average

• compounded mean

• moving average

48. Sharding refers to:

Answers:

• none of the above

• partioning a database for distribution across different servers

• a measure of the noise in a database's contents

• simultaneously accessing multiple object databases over SSH

49. What is Dependency Modeling?

Answers:

• The process of finding a model which describes significant dependencies between variables

• A multi-step process involving data preparation, pattern searching, knowledge evaluation, and refinement with iteration after modification.

• A task which consists of techniques for estimating, from data, the joint multi-variate probability density function of all of the variables/fields in the database.

• Learning a function that maps a data item into one of several predefined groups or clusters.

50. What is Regression?

Answers:

• Learning a function that maps a data item to a real-valued prediction variable.

• An expression E in a language L describing facts in a subset FE of F.

• Learning a function that maps a data item into one of several predefined groups.

• A descriptive task where one seeks to identify a finite set of categories to describe the data.

51. Which of the following storage solutions is most appropriate for a semi-structured dataset whose members do not all have the same attributes?

Answers:

• MariaDB

• MongoDB

• SQLite

• MySQL

52. In order to estimate classification performance on an entire population, you need _______

Answers:

• disjoint training and test datasets

• Disjoint training

• (None of these)

• Test Datasets

53. What is the type of data mining that drives the Amazon.com recommendation system?

Answers:

• Association Learning

• Anomaly Detection

• Clustering Algorithms

• Fuzzy Logic

54. Which of the following algorithms is generally suitable for unsupervised learning tasks?

Answers:

• Restricted Boltzmann machine

• k-nearest neighbor

• info-fuzzy networks

• k-means algorithm

55. True or False? Tests in CART are always Binary.

Answers:

• True

• False

56. Which of these are evolutionary computational methods?

Answers:

• Heuristic algorithms

• Bayesian inference algorithms

• Genetic algorithms

• Clustering algorithms

57. Generalization error is a consequence of

Answers:

• Poorly defined Chernoff Bound

• Underfit

• Parametric analysis

• Overfit

58. A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset is:

Answers:

• Decision Treeing

• Nearest Neighbor

• Association Model Query

• Logistic Regression

59. What is the extraction of useful if-then rules from data based on statistical significance?

Answers:

• Dynamic Information Inference

• Preliminary Method Mapping

• Rule Induction

• Fuzzy Logic Application

60. What is a genetic algorithm?

Answers:

61. In the MapReduce model, Map and Reduce functions act directly on which kind of data structure?

Answers:

• key-value pair

• linked lists

• MySQL matrices

• relational databases

62. What is Interestingness?

Answers:

• A multi-step process involving data preparation, pattern searching, knowledge evaluation, and refinement with iteration after modification.

• An expression E in a language L describing facts in a subset FE of F.

• A discovered pattern that is true on new data with some degree of certainty, and generalizes to other data.

• An overall measure of pattern value, combining validity, novelty, usefulness, and simplicity.

63. Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?

Answers:

• Markov chains

• Neural Networks

• Dijkstra's algorithm

• k-means algorithm

64. True or False? The MARS algorithm cannot produce rules.

Answers:

• True

• False

65. In which type of analysis is a Kohonen feature map typically employed?

Answers:

• Cluster analysis

• Exploratory data analysis

• Descriptive modeling analysis

• Predictive analysis

66. What is Classification?

Answers:

• Learning a function that maps a data item into one of several predefined groups.

• Methods for finding a compact description for a subset of data.

• A discovered pattern that is true on new data with some degree of certainty, and generalizes to other data.

• A descriptive task where one seeks to identify a finite set of categories to describe the data.

67. Which of the following is NOT a common source system?

Answers:

• Node

• SAP source

• DB Connect

• UDC

68. A DBMS reduces data redundancy and inconsistency by

Answers:

• Utilizing a data dictionary

• Enforcing referential integrity

• uncoupling program and data

• Minimizing isolated files with repeated data

69. Which of the followng clustering algorithms can optimize an ojbective function?

Answers:

• k-means only

• Subspace Clustering Algorithms

• DSBSCAN and Single Link

• k-means and CLARANS

70. Which of the following is not a common goal of the KDD Process:

Answers:

• Prediction

• Performance

• Description

71. What is Clustering?

Answers:

• A descriptive task where one seeks to identify a finite set of categories to describe the data.

• A task which consists of techniques for estimating, from data, the joint multi-variate probability density function of all of the variables/fields in the database.

• Learning a function that maps a data item into one of several predefined groups or clusters.

• The process of finding a model which describes significant dependencies between variables

72. Which of the following is NOT a function of data warehouses?

Answers:

• Extracting data

• Cleaning dirty data

• Storing purchased data

• Cleaning data

73. In Natural Language Processing, what is the role of a lexical analyzer?

Answers:

• processes the parse tree for semantic meaning

• generates a context-free grammar

• checks the validity of a token

• splits the stream of input characters into tokens

74. Which of the following properties is a constraint on a RESTful application?

Answers:

• stateless

• linearly seperable

• returns JSON output

• stateful

75. What is Summarization?

Answers:

• A descriptive task where one seeks to identify a finite set of categories to describe the data.

• Methods for finding a compact description for a subset of data.

• A task focusing on discovering the most significant changes in the data from previously measured or normative values

• The process of finding a model which describes significant dependencies between variables

76. Which of the following is NOT a method of combining multiple models into an ensemble model?

Answers:

• Bootstrapping

• Averaging

• Stacking

• Voting

77. The component of the Hadoop Distributed Filesystem responsible for storing metadata is called the

Answers:

• Datanode

• FS Shell

• Namenode

• DFSAdmin

78. Converted information to provide insights about historical patterns and future trends is known as:

Answers:

• Clustering

• Linear regression

• Meta-data

• Knowledge

79. Which of the following properties applies to Single-Layer Perceptrons?

Answers:

• continuous output

• random initalization of weights

• backpropagation

• able to learn non-linear separations

80. Which of the following applications are usually used to classify students' performances?

Answers:

• Market-basket analysis

• Regression analysis

• Cluster analysis

• If...then... analysis

81. The authentication protocol used by many significant web APIs is called:

Answers:

• OAuth

• HTTPS

• SSL

• PGP

82. In any numerical data set with a meaningful mean value, what is the minimum fraction of data that will fall within n standard deviations of the mean?

Answers:

• 1/n^2

• 1-1/n^2

• 1/n

• 1/2n

83. What is CURL?

Answers:

• A command-line tool for retrieving files

• A methodology for classifying hidden features of data

• The part of HTTP that specifies access permission

• Combinatorial Unsupervised Recursive Learning algorithm

84. Which of these is a possible architecture of a data mining system?

Answers:

• Transitive coupling

• Quickstart coupling

• No-coupling

• Magnetic coupling

85. Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?

Answers:

• //a/profile

• //a/[contains(@href, "profile")]

• //a/[contains(@href, "profile")]/@href

• //href/profile

86. What is the first step in the business understanding phase?

Answers:

• Create data mining goals to achieve the business objectives

• Firmly grasp business objectives and needs

• Create a list of all relevant algorithms to be applied to the task

• Assess the current situation by finding out the resources, assumptions, constraints etc.

87. Taking multiple random samples of data and building a classification model for each is known as:

Answers:

• Binning

• Fuzzy Sampling

• Boosting

• Clustering

88. What is Pig

Answers:

• A programming language that enables Hadoop to operate as a data warehouse.

• A programming language that simplifies the common tasks of working with Hadoop.

• None of these

89. A commonly used continuous alternative to the step function in multi-layered neural network output is the

Answers:

• logarithmic function

• hyperbolic function

• logistic function

• multi-layered NN cannot compute continuous output

90. Which of the following algorithms produces decision trees?

Answers:

• DBSCAN

• ID3

• none of the above

• logistic regression

91. Which of these is not a step in the KDD process?

Answers:

• Data Mining

• Data Cleaning

• Data Integration

• Data Quantification

92. "In 2% of the purchases at the hardware store, both a pick and a shovel were bought,” is an example of:

Answers:

• Supervised learning

• Validation

• Support

• Topology

93. Apriori is a seminal algorithm for ﬁnding frequent item sets using:

Answers:

• Normal mixture models

• Overfitting methods

• Candidate generation

• None of these

94. If more than one value occurs the same number of times, the data is:

Answers:

• Multivariated

• Multi-faceted

• Multi-modal

• Multi-leafed

95. The level of the model that specifies the strengths of the dependencies using some numerical scale.

Answers:

• Numeric Level

• Primary Level

• Quantitative Level

• Dependency Level

96. Which of the following method can be used for modeling a categorical target variable?

Answers:

• Non-Linear Regression

• All of the Above

• Regression

• Logistic Regression

• ARIMA

97. Which of the following is not a primary phase of a Hadoop Reducer?

Answers:

• Shuffle

• Reduce

• Map

• Sort

98. The measured differences between a model and its predictions are known as:

Answers:

• Outliers

• Range

• Non-applicable data

• Noise

99. Which decision tree method performs multi-level splits when computing classification trees?

Answers:

• C4.5 algorithm

• ID3 (Iterative Dichotomiser 3)

• CHAID (Chi Square Automatic Interaction Detection)

• CART (Classification and Regression Trees)

100. True or False? Artificial neural networks are linear predictive models.

Answers:

• False

• True

101. Which of the following is not an appropriate tool for harvesting data from a website that accesses its database through Javascript/AJAX calls?

Answers:

• PhantomJS

• wget

• Selenium

• All of the above are appropriate

102. What is the advantage of the k-Medoids Clustering Algorithm over the k-Means Clustering (Lloyd's) Algorithm?

Answers:

• represents clusters by center

• all of the above

• more resistant to outliers

• uses iterative refinement

103. Which of the following is not valid JSON?

Answers:

• All are valid

• {["answer": "this one"]}

• {"answer": ["this one"]}

• {"answer": "this one"}

104. Which of the following is part of a retail customer data mining strategy?

Answers:

• holiday sale

• customer testimonials

• loyalty cards

• money-back guarantee

105. The two major functions of BI servers are:

Answers:

• Management and delivery

• Processing and management

• Source and results

• Application and delivery

106. How do you measure interestingness in association patterns?

Answers:

• measure lift

• meaure accuracy

• measure variance

• measure relevance

107. Where can a website operator generally find data on her customers' IP addresses?

Answers:

• all of the above

• HTTP request headers

• cookies

• server logfiles

108. Hash based technique, Transaction Reduction, Portioning, Sampling, and Dynamic Item Counting are all examples of what?

Answers:

• Method to repeatedly scan the scan the database and check a large set of candidates by pattern matching.

• Techniques to improve the efficiency of an Apriori algorithm

• Methods of generating frequent item sets without candidate generation.

• Methods for finding a compact description for a subset of data.

109. Data mining provides a link between:

Answers:

• Parallel processing and RAID

• Separate transactional and analytical systems

• Online analytical processing and dynamic information

• Genetic algorithms and logistic regression

110. A descriptive approach to exploring data that can help identify relationships among values in a database is:

Answers:

• Function activation

• Predictive analysis

• Clustering

• Link analysis

111. What is Hive

Answers:

• Hive enables Hadoop to operate as a data warehouse.

• Hive is a programming language that simplifies the common tasks of working with Hadoop.

• Both of these

112. What is the purpose of the Hadoop Distributed File System (HDFS)?

Answers:

• Creating a context in which there are no restrictions on the data, enabling it to be unstructured and schemaless.

• All of these.

• Ensuring that data is replicated with redundancy across the cluster.

• To enable computation to take place by allowing each server to have access to the data.

113. The silhouette coefficient can be used to determine the natural number of clusters for ________.

Answers:

• Density Based Algorithms

• Hierarchichal Algorithms

• Subspace Clustering Algorithms

• Partitioning Algorithms

Online Test-Exam Upwork (oDesk) and Freelancer

Data Mining 2015

No comments:

Post a Comment