Saskatoon SAS user group

Download this Presentation

0

Presentation Transcript

  • 1.Saskatoon SAS user group Efficiency and data mining?
  • 2.Agenda Background Case Study
  • 3.Agenda Background Case Study
  • 4.Uses a variety of tools Data Scientist Business Analyst Heavy Excel user IT Management Executive Consistent answers Predictive Analytics…Data science…Statistics…Machine Learning…Data mining It means different things to different people? Show me the easy button Show me the power How do we manage this? So what? Tries to avoid next migraine
  • 5.The Data Mining Process This process is your friend. Use it. Iterate. Fail fast. SEMMA Process CRISP-DM Methodology CRISP-DM is good methodology SEMMA is a process in Enterprise Miner. It aligns well with CRISP-DM
  • 6.Building a predictive model 3 Approaches Rapid Predictive Modeler (RPM) Preconfigured Enterprise Miner workflow in Enterprise Guide Easy Quick Good models Auditable and reusable Enterprise Miner Visual workflows Powerful Medium difficulty Great models Auditable and reusable Programming Difficult to learn Some Data Scientists prefer this Not suitable for the business analyst
  • 7.The Data Mining Process How to add efficiency Use visualization early in the process Don’t be afraid to build models, start with RPM Fail fast Understand the problem Understand the data
  • 8.Agenda Background Case Study
  • 9.The Data Mining Process Case study We have a problem! Use actionable, in-memory, big-data, cloud, machine-learning, analytics to fix it You mean use predictive modeling to find the trucks that are going to blow up Last time it was altitude related
  • 10. 40 000 vehicles – Fleet is ageing Trucks are equipped with Telematics The data scientist is on vacation Dataset = 1,5GB (2M rows) !!!!!!!!!! - my spreadsheet won’t open it….. Business Analyst Data Scientist
  • 11.Case study What I am going to show you Demo 1 Visual exploration of timeline Cluster analysis Use visualization early in the process to formulate a strategy
  • 12.Case study What I am going to show you Demo 2 Feature engineering 2 Minute model Enterprise Model Rapid Predictive Modeler Enterprise Miner Don’t be afraid to model
  • 13.Case study What I am going to show you Demo 3 Create score-code Geo spatial representation of scored data This is how we derive value from the model
  • 14.Sample & Explore Data Missing data is a landmine. Identify and remediate. Visualize - Reconstruct a timeline Explore before sub setting or filtering Demo 1 Visual exploration of timeline Cluster Analysis
  • 15.
  • 16.
  • 17.Sample & Explore Data Now that I understand the data, I have a plan Sample only Alternator faults Focus on recent data. Using all the history may pollute my model Cluster Analysis in Visual Analytics
  • 18.Modify Model Assess Use Rapid Predictive Modeler to fail fast Look at the variable importance chart Engineer features into the data Mitigate the risk of overfitting – (holdouts, model selection criteria) Demo 2 Feature engineering RPM Advanced EM Model
  • 19.Modify Data Engineered Features Binning into deciles Altitude Engine hours Years in service Odometer mileage Oil temp Water temp Computed variables RPM Days since service origin Water temp * Oil temp Binning into quartiles Speed RPM Water temp*oil temp Days since service origin
  • 20.Modify Model Assess We improve the model by iterating
  • 21.Pre release version of SAS Visual Data Mining and Machine Learning
  • 22.Deploy How will the model output be used by someone that knows nothing about data science? Scorecode is useful. A model is not. Visualize the output Demo 3 Create score-code Geo spatial representation of scored data
  • 23.Deploy Out of a truck fleet of 2000+ 72 have fault codes on alternators 12 are prioritized for maintenance based on the prediction This is where they are
  • 24.The Data Mining Process How to add efficiency Use visualization early in the process Don’t be afraid to build models, it is easy, start with RPM Fail fast
  • 25.Ideas?Questions?