Data Concerns Modeling Concerns

How was the data you are using collected?
What assumptions is your model making by learning from this dataset?
Is this dataset representative enough to produce a useful model?
How could the results of your work be misused?
What is the intended use and scope of your model?

Data Collection:

  • Massive Datasets: Machine learning thrives on large amounts of data. This data can come from various sources, including public databases, sensor readings, user interactions, and even simulations.
  • Collection Methods: The methods used depend on the data source. For instance, web scraping might be used for public data, while surveys or app integration might be used for user-generated data.

Assumptions and Bias:

  • Underlying Patterns: Models are trained to identify patterns in the data. These patterns are assumed to hold true for future data, which isn't always guaranteed.
  • Bias from Data: The data itself can be biased, reflecting the way it was collected or inherent societal biases. A model trained on biased data will perpetuate those biases in its outputs.

Representativeness and Generalizability:

  • Generalizability Goal: The goal is to create a model that works well on new, unseen data. This depends on how well the training data represents the real-world scenario the model will be used in.
  • Limited Data Issues: If the training data is limited or not diverse enough, the model might not perform well on unseen data. This is known as overfitting.

Misuse of Results:

  • Unintended Consequences: A model designed for one purpose could be misused for another, potentially leading to unfair or discriminatory outcomes.
  • Transparency Issues: If the inner workings of a model are not transparent, it can be difficult to identify and address potential biases or errors.

Intended Use and Scope:

  • Clearly Defined Goals: Machine learning models are built for specific purposes. It's crucial to define the intended use and scope clearly from the outset.
  • Responsible Development: Developers should consider potential biases and limitations during development to ensure the model is used responsibly.





