NATURAL DATA BASE .

Free Become Information Data are the events recorded in The world,Types of Data The data can be divided in to two distinct,categories.TheDataAnalysisProcess.stage problemof Defensesion, DataExplorationVisualization.

Understanding the Nature of the DataThe object of study of the data analysis is basically the data. The data then will be the key players in all processes of the data analysis. They constitute the raw material to be processed, and thanks to their processing and analysis it is possible to extract a variety of information in order to increase the level of knowledge of the system under study, that is, one from which the data came from.When the Data Become Information Data are the events recorded in the world. Anything that can be measured or even categorized can be converted into data. Once collected, these data can be studied and analyzed both to understand the nature of the events and very often also to make predictions or at least to make informed decisions.

When the Information Becomes Knowledge.

You can speak of knowledge when the information is converted in to a set of rules that help you to better understand certain mechanisms and so Consequently, to make predictions on the evolution of some events.

Types of Data The data can be divided into two distinct categories:..!

• categorical

• nominal

• ordinal

. numerical

• discrete

Categorical data are values or observations that can be divided into groups or categories. There are two types of categorical values: nominal and ordinal. A nominal variable has no intrinsic order that is identified In its category. An ordinal variable instead has a predetermined order.Numerical data are values or observations that come from measurements. There are two types of different numerical values: discrete and continuous numbers. Discrete values are values that can be counted and that are distinct and separated from each other. Continuous values, on the other hand, are values produced by measurements or observations that assume any value within a defined range.

The Data Analysis Process.

Data analysis can be described as aprocess consisting of several steps in which the raw data are transformed and processed in order to produce data visualizations and can make predictions thanks to a mathematical model based on the collected data. Then, data analysis is nothing more than a sequence of steps, each of which plays a key role in the subsequent ones. So,data analysis is almost schematized as a process chain consisting of the following sequence of t

Stages:

• Problem definition

• Data extraction

• Data cleaning

• Data transformation

• Data exploration

• Predictive modeling

• Model validation/test

• Visualization and interpretation of results

• Deployment of the solution

a schematic representation of all the processes involved in the data analysis.

Problem Definition

The process of data analysis actually begins long before the collection of raw data. In fact, a data analysis always starts with a problem to be solved, which needs to be defined.The problem is defined only after you have well-focused the system you want to study: this may be a mechanism,an application, or a process in general. Generally this study can be in order to better understand it's operation, but in particular the study will be designed to understand the principles of its behavior in order To be able to make predictions, or to make choices (defined as an informed choice.

The definition step and the corresponding documentation (deliverables) of the scientific problem or business are both very important in order to focus the entire analysis strictly on getting results. In fact, a comprehensive or exhaustive study of the system is sometimes complex and you do not always have

enough information to start with. So the definition of the problem and especially its planning can determine uniquely the guidelines to follow for the whole project.Once the problem has been defined and documented, you can move to the project planning of a data.analysis. Planning is needed to understand which professionals and resources are necessary to meet the requirements to carry out the project as efficiently as possible. So you’re going to consider the issues in the

area involving the resolution of the problem. You will look for specialists in various areas of interest and finally install the software needed to perform the data analysis.Thus, during the planning phase, the choice of an effective team takes place. Generally, these teams should be cross-disciplinary in order to solve the problem by looking at the data from different perspectives. So,the choice of a good team is certainly one of the key factors leading to success in data analysis.

Data Extraction

Once the problem has been defined, the first step is to obtain the data in order to perform the analysis. The data must be chosen with the basic purpose of building the predictive model, and so their selection is crucial for the success of the analysis as well. The sample data collected must reflect as much as possible the real world, that is, how the system responds to stimuli from the real world. In fact, even using huge data sets of raw data, often, if they are not collected competently, these may portray false or unbalanced situations

compared to the actual ones.Thus, a poor choice of data, or even performing analysis on a data set which is not perfectly representative of the system,will lead to models that will move away from the system under study.

The search and retrieval of data often require a form of intuition that goes beyond the mere technical research and data extraction. It also requires a careful understanding of the nature of the data and their form, which only good experience and knowledge in the problem’s application field can give.

Regardless of the quality and quantity of data needed, another issue is the search and the correct choice of data sources.if the studio environment is a laboratory (technical or scientific),and the data generated are experimental, then in this case the data source is easily identifiable. In this case, the problems will be only

concerning the experimental setup.

But it is not possible for data analysis to reproduce systems in which data are gathered in a strictly experimental way in every field of application. Many fields of application require searching for data from the surrounding world, often relying on experimental data external,or even more often collecting them through interviews or surveys. So in these cases, the search for a good data source that is able to provide all the information you need for data analysis can be quite challenging. Often it is necessary to retrieve data from multiple data sources to supplement any shortcomings, to identify any discrepancies, and to make our data set as general as possible.When you want to get the data, a good place to start is just the Web. But most of the data on the Web can be difficult to capture; in fact, not all data are available in a file or database, but can be more or less implicitly content that is inside HTML pages in many different formats. To this end, a methodology called Web Scraping, which allows the collection of data through the recognition of specific occurrence of HTML tags within the web pages, has been developed. There are software specifically designed for this purpose,and once an occurrence is found, they extract the desired data. Once the search is complete, you will get a list of data ready to be subjected to the data analysis.

Data Exploration/Visualization

Exploring the data is essentially the search for data in a graphical or statistical presentation in order to find

patterns, connections, and relationships in the data. Data visualization is the best tool to highlight possible

patterns.

In recent years, data visualization has been developed to such an extent that it has become a real

discipline in itself. In fact, numerous technologies are utilized exclusively for the display of data, and equally

many are the types of display applied to extract the best possible information from a data set.

Data exploration consists of a preliminary examination of the data, which is important for

understanding the type of information that has been collected and what they mean. In combination with

the information acquired during the definition problem, this categorization will determine which method of

data analysis will be most suitable for arriving at a model definition.

Generally, this phase, in addition to a detailed study of charts through the visualization data, may

consist of one or more of the following activities:

• Summarizing data

• Grouping data

• Exploration of the relationship between the various attributes

• Identification of patterns and trends

• Construction of regression models

• Construction of classification models

Generally, the data analysis requires processes of summarization of statements regarding the data to

be studied. The summarization is a process by which data are reduced to interpretation without sacrificing

important information.

Clustering is a method of data analysis that is used to find groups united by common attributes

(grouping).

Another important step of the analysis focuses on the identification of relationships, trends, and

anomalies in the data. In order to find out this kind of information, one often has to resort to the tools as well

as performing another round of data analysis, this time on the data visualization itself.

Other methods of data mining, such as decision trees and association rules, automatically extract

important facts or rules from data. These approaches can be used in parallel with the data visualization to

find information about the relationships between the data.

Quantitative and Qualitative Data Analysis

Data analysis is therefore a process completely focused on data, and, depending on the nature of the data, it is possible to make some distinctions.When the analyzed data have a strictly numerical or categorical structure, then you are talking about quantitative analysis, but when you are dealing with values that are expressed through descriptions in natural language, then you are talking about qualitative analysis.Precisely because of the different nature of the data processed by the two types of analyses, you can observe some differences between them.Quantitative analysis has to do with data that have a logical order within them, or that can be categorized in some way. This leads to the formation of structures within the data. The order, categorization,and structures in turn provide more information and allow further processing of the data in a more strictly mathematical way. This leads to the generation of models can provide quantitative predictions, thus allowing the data analyst to draw more objective conclusions.Qualitative analysis instead has to do with data that generally do not have a structure, at least not one that is evident, and their nature is neither numeric nor categorical. For example, data for the qualitative Study could include written textual, visual, or audio data. This type of analysis must therefore be based on methodologies, often ad hoc, to extract information that will generally lead to models capable of providing qualitative predictions, with the result that the conclusions to which the data analyst can arrive may also include subjective interpretations. On the other hand, qualitative analysis can explore more complex systems and draw conclusions which are not possible with a strictly mathematical approach. Often this type of analysis involves the study of systems such as social phenomena or complex structures which are not easily measurable.