The rapid development of information technology, in particular, progress in methods of data collection, storage, and processing has allowed many organizations collect massive amounts of data that must be analyzed. The volume of data is so large that there is not enough power provided by experts. This led to the appearance of such a tool as Data Mining.
Data Mining - a collective term used to refer to the aggregate of methods of detection in data previously unknown, non-trivial, practically useful and affordable interpretations of knowledge necessary for decision-making in the various spheres of human activity.
The data which were obtained by use of the Data mining describe the new connection between the properties; predict some characteristic values based on others.
The range of tasks that are solved by Data mining includes:
- Classification - structuring of objects of a given class
- Association - revealing associative chains.
- Clustering - a grouping of events and observations in clusters.
- Forecasting - prediction on the basis of available data possible developments both progressive and regressive.
- Analysis of changes - the identification of typical situations templates.
Models of knowledge representation in Data Mining
- Artificial Neural Networks
- Decision trees, symbolic rules
- Methods of nearest neighbor and k-nearest neighbor
- Support Vector method
- Linear Regression
- Hierarchical cluster analysis methods
- A limited enumeration method
Most of the analytical methods used in Data Mining technology are well-known mathematical algorithms and methods.
Properties of Data Mining methods
Various methods of Data Mining are characterized by certain properties which can be decisive in the choice of data analysis method. The methods can be compared between themselves, estimating characteristics of their properties.
The basic properties and characteristics of the methods of Data Mining are accuracy, scalability, interpretability and verifiability, labor intensity, flexibility, speed, and popularity.
Classification of methods
statistical methods based on the use of the average experience, which is reflected in the retrospective data;
cybernetic methods, comprising a plurality of heterogeneous mathematical approaches.
Statistical methods of Data mining
Statistical Data Mining methods are classified into four groups:
- Descriptive analysis and description of the original data.
- Analysis of relationships (correlation and regression analysis, factor analysis, analysis of variance).
- Multivariate statistical analysis (component analysis, discriminant analysis, multivariate regression analysis, canonical correlation, and others.).
- Time series analysis (dynamic modeling and forecasting).
Cybernetic methods of Data Mining
This group includes such methods:
- artificial neural networks (pattern recognition, clustering, forecasting);
- evolutionary programming;
- genetic algorithms (optimization);
- associative memory (search of analogs, prototypes);
- fuzzy logic;
- decision trees;
It should be noted that today DataMining technology is most widely used in solving business problems.
Advances in Data Mining technology used in the banking and other industry for the following common tasks:
- detection of fraud with credit cards. By analyzing past transactions, which subsequently turned out to be fraudulent.
- customer segmentation. Dividing customers into different categories, the banks makes their marketing efforts more targeted and efficient, offering a variety of services to different customer groups.
- development of the automotive industry. When assembling the car manufacturers should take into account the requirements of each customer, so they need to have the ability to predict the popularity of certain characteristics and knowing what characteristics are usually ordered together
Follow me, to learn more about popular science, math, and technologies
With Love,
Kate
Cool stuff Krishtopa! I was at workshop of Lev Manovich in data mining years ago. Check him out he is cool. Glad to see someone puts attention in it. Keep digging ;)
Thank you for a useful post. You seem to be knowledgeable about this area. I decided to follow you. Keep it on!