What is the Most Original Way to Build Your Data Analysis Project From Scratch?

Sometimes Creating New Data Is Your Best Choice

Juan Moctezuma-Flores
6 min readFeb 16, 2021
Photo by Markus Winkler on Unsplash

When it comes to data sourcing — the process of obtaining structured or unstructured information, creating a new dataset might be more efficient than using someone else’s data. Why? Because you’ll have both, an original and a unique dataset. Don’t get me wrong, using data from other sources is great and necessary depending on what you are doing. However, if you want to develop data analytics skills and critical-thinking abilities, it’s better for you to create your project from scratch.

Using data from an external source might include downloading it from a third-party, requesting it through an API or using a web scraper. When it comes to downloading datasets or using APIs, data might not be free; and you have to be careful when using web scraping methods due to copyright infringement. If your choices are limited and you can’t find the data that you need, then build and develop the dataset yourself!

What To Do Before You Start Building New Data?

Before you do anything, ask yourself a question (or a few) about something that you want to know or that you are interested in! Your project can be about anything, however, set specific and realistic goals before compiling your data. You don’t want to waste your time and resources on something that is irrelevant. Also, be sure to know the specific questions that you are asking yourself. You don’t want to start gathering information and later on realize that you forgot to compile something else relevant towards your objective. If you are a job-seeker then I highly recommend you to focus on doing your project on something related to marketing, finance, logistics, retail, accounting, etc., so that you can reflect a business acumen throughout a data-driven approach. Before you present your project, ask yourself how your work is relevant towards a real-world application? or, How your analysis is going to bring value to an industry or employer?

Photo by Kelly Sikkema on Unsplash

How Can You Compile Your Data?

You can gather your information passively or actively. A passive strategy implies doing so through observation and recording the data that you need. For instance, a delivery associate working with Amazon that compiles performance data based on his/her own reliability and quality-delivery rate. The previous example would imply a passive approach since an individual is only keeping track of what he/she observes.

Photo by Pawel Janiak on Unsplash

An active approach refers to directly intervening in any situation that will allow you to get your data. For instance, an individual interested in opening his/her own food stand would have to directly ask subjects from a random sample (individuals from all sorts of backgrounds) about their favorite type of street food in order to get an understanding of what the public demands. Please note that a ‘sample’ refers to large group that is studied from an entire population. In situations such as the previous example, there is no way to obtain the desired data unless there’s direct interaction, which sometimes implies speaking directly with people.

Photo by Celpax on Unsplash

The active approach’s methods may include interviewing individuals, surveys, card sorting and experimentation. Please note that you may determine what ‘experimentation’ means and how you conduct it, however, experimentation methods in data analytics may involve laboratory equipment, data science techniques, automation testing or advanced programming. If your project doesn’t involve anything complex that requires advanced knowledge in any specific field, then chances are that you don’t have to worry about this method.

What Type of Data Are You Gathering?

You can work with qualitative and quantitative data. Qualitative data is descriptive and it ranges from text, images, video or audio. Quantitative refers to numerical information such as percentages, sums, etc. In most data analysis projects you will mostly likely work with quantitative data. Thus, you need numbers to be able to build data visualizations or dashboards. You don’t necessarily have to come up with complicated formulas because basic statistics should be more than enough as long as your computations make sense and are aligned with your project’s goal. In addition, no matter what data category you are working with, please remember to manually clean your data by maintaining your information in order, having clear naming conventions (provide explanation if necessary), avoiding to have incomplete rows, using the appropriate data type or format, etc.

Photo by Luke Chesser on Unsplash

Other Prerequisites For Building and Launching Your Project

You don’t need to come from a specific background as long as you have knowledge in basic math / statistics, computer literacy, data entry and perhaps some programming skills; advanced coding abilities may vary depending on what you are building. You must know basic-intermediate knowledge in Microsoft Office, hence, Excel, PowerPoint and Word.

Photo by Mika Baumeister on Unsplash

If you are a job-seeker, your analysis project should definitely include a technical or research paper in addition to a PowerPoint presentation. You might be wondering why writing a paper is necessary; Not only is efficient writing and communication abilities required in data analytics, but some employers require candidates to include ‘writing samples’ of any relevant work in their job application, or during the hiring process.

Photo by Annie Spratt on Unsplash

PowerPoint presentations are great ways to summarize and visually demonstrate your project, however, if you do decide to write a paper then you should (ideally) include the following things:

  • Formal introduction and abstract
  • Description of your data collection and exploration experience, in other words, what methods and tools you used to compile your data and what trends or patterns did you observe
  • Description of your dataset such as column or header names, data types within spreadsheet, etc.
  • Explanation of your numerical results which may include graphs, tables, or charts
  • Explanation of how your analysis is relevant towards real-world applications or specific industries or sectors
  • Conclusion and links (references in the format of your choice)

At this point you might be asking yourself how am I going to publish my project as an open source to the public? Open a Github account (it’s free), create a repository, load your work or files, and include a clear description of your project in the ‘readme’ section. Once you have your repository, a link or URL is created and you may share it! What I’ve personally done in the past is to include my projects’ links within my resume.

Photo by Luke Chesser on Unsplash

Once you have your project done, you may continue enhancing your technical skillset by going the extra mile. How? By adding things around your dataset. You may use the open source tools in the internet that would allow you to store your data in an SQL database, build an ETL pipeline, create visuals in Tableau Public, write cleansing scripts in Python via Jupyter Notebooks, etc. You don’t have to do everything listed in the previous sentence, but the advantage of any data analysis project is that you may get creative in many ways!

Conclusion

Whether you are a beginner or an experienced analyst you can always come up with original datasets and build your project around it. Keep your work simple, as you start building your dataset and add more things (such as data visualizations, PowerPoints, coding scripts or technical papers or documentation — if applicable), you’ll notice that your project becomes more complex. Remember that there are no specific guidelines or templates for building these types of projects and probably anyone from any background can create an analysis project by passively or actively obtaining data. Data analytics will always be relevant towards any field, sector or industry.

--

--