Posts

Showing posts from March, 2019

What projects would you be interested in working on?

What projects would you be interested in working on? * Applications of the HEART metrics within usability testing * how we can apply behavior modification and motivational theories to improve a current system or flow such that it improves their happiness, engagement, adoption, overall life satisfaction, relationships * Understanding and defining what metrics would best capture human behavior * Improving lifestyles and cognitive behaviors by improving app awareness and/or activation, discovery, accessibility, and usability. Exploring the use cases that will improve these factors * Understanding various demographics behavior with apps, organizing them, accessing them, learning about them, motivations behind downloading and searching apps

Imagine doing research in a specific UX in which half the people hate a specific design aspect (where it's actually completely impossible that this would ever happen in that particular UX). How would you solve that

Imagine doing research in a specific UX in which half the people hate a specific design aspect (where it's actually completely impossible that this would ever happen in that particular UX). How would you solve that Look at the big picture -- what are the tradeoffs for this design in terms of stakeholder goals? If this is specifically a design issue, would users hate it so much that would not use the ? Does this design that hate have other benefits such as error reduction or lower task reduction? I would bring these ideas into the discussion to help team members zoom out and evaluate the bigger picture in a more service design rather than interface design resolution.

How do you determine which task to use?

How do you determine which task to use? Discuss with stakeholders and users about what the important goals are for the product and things that are even possible to do through interviews. Are we trying to look for action or intent? Action is more straightforward performance-related numbers and I would just conduct stakeholder and user interviews to discuss about what the important goals are for the product and things that are even possible to do. Intent would be more interesting, that would have to include developing metrics to evaluate content, motivation and attitudes as well as thhe collective performance based on user actions or lack of action. Something like that HEART metrics would influence what tasks I use

How do you determine which task to use?

How do you determine which task to use? Discuss with stakeholders and users about what the important goals are for the product and things that are even possible to do through interviews. Are we trying to look for action or intent? Action is more straightforward performance-related numbers and I would just conduct stakeholder and user interviews to discuss about what the important goals are for the product and things that are even possible to do. Intent would be more interesting, that would have to include developing metrics to evaluate content, motivation and attitudes as well as thhe collective performance based on user actions or lack of action. Something like that HEART metrics would influence what tasks I use

How many tasks to ask a specific user?

How many tasks to ask a specific user? No definite answer to what is enough (especially since you don't want to overwhelm/pressure them or make them believe it is time-based) prepare a set of tasks that you can ask them as well as potential probing questions to complete within an allotted time frame (i.e. how many tasks can you fit in is the more important question)

Should I run a field visit or usability test?

Should I run a field visit or usability test? A field visit tells you if you're designing the right thing (big picture). A usability test tells you if you've designed the thing right (microscope). For example, your product might perform fine in a usability test but it will fail if people don't really care about the tasks you've asked them to complete. • Is there a user problem to be solved? (If unsure, carry out field research). • Have we solved it? (If unsure, carry out usability testing).

Name an alternative to one-way ANOVA for independent groups (where ANOVA assumptions not met) ? Name an alternative to Repeated measures ANOVA?

Name an alternative to one-way ANOVA for independent groups (where ANOVA assumptions not met) ?  Name an alternative to Repeated measures ANOVA?  What would be suitable follow-up tests to use instead of T-tests for each respectively? To be used when your data does not meet ANOVA assumptions (e.g. not normally distributed and small sample) Not as powerful as parametric tests Based on 'ranked' data rather than means, so not skewed by extreme scores Kruskal-Wallis: alternative to one-way ANOVA for independent groups Friedman's ANOVA: alterative to RM ANOVA Use pairwise comparisons to follow up if significant (reducing alpha level): Mann-Whitney Tests - independent groups Wilcoxon Test - related groups

How would you find the total number of conditions/levels in a factorial design? E.g. 2x3

How would you find the total number of conditions/levels in a factorial design? E.g. 2x3 It's factorial! (multiplication, multiply it! 2x3=6 conditions/cell means or, draw the table to help visualise it

How to work out what kind of design an ANOVA has!

How to work out what kind of design an ANOVA has!  =Need to work on, working out ANOVA designs. BETWEEN:When different subjects are used for the levels of a factor, the factor is called a between-subjects factor or a between-subjects variable. The term "between subjects" reflects the fact that comparisons are between different groups of subjects. All factors between subjects= a between subjects design WITHIN:When the same subjects are used for the levels of a factor, the factor is called a within-subjects factor or a within-subjects variable. Within-subjects variables are sometimes referred to as repeated-measures variables since there are repeated measurements of the same subjects. Each subject tested within each condition REPEATED MEASURES AND "WITHIN" ARE SYNONYMOUS= SAY "REPEATED MEASURES", OR SAY BOTH! (purely within participants could also potentially be quasi-experimental(?) so say repeated to avoid ambiguity REPEATED MEASURES: same part...

What kind of follow-up tests should you use for within-subjects factorial ANOVA?

What kind of follow-up tests should you use for within-subjects factorial ANOVA? For follow up testing: WITHIN participants; use RELATED T-tests rather than independent T tests

How does multifactorial ANOVA differ with within-participant factors?

How does multifactorial ANOVA differ with within-participant factors? Same asssumptions as for between-subjects ANOVA, plus assumption of sphericty (ONLY IF MORE THAN TWO CONDITIONS FOR A WITHIN-SUBJECTS IV) Constant source of error due to having same participants in different conditions is subtracted from the error variance, as a result reducing the error term ("partialling out"; we do this because one assumption of the statistical tests (not earlier covered) is that the data from each condition should be independent of all other conditions. In order to do this, the consistent effects of participants across all conditions (e.g. those who tend to perform well will do so over all conditions) are removed statistically, so that the conditions will effectively be independent of each other and analysis can continue Extra table: "Within Subjects Effects": because same participants in each condition, we are able to calculate the degree of error associated with e...

If you find a significant interaction in ANOVA, what should you do?

If you find a significant interaction in ANOVA, what should you do? Explore that interaction further! How should you do that? Simple effect tests: corrected T-tests!

What are the assumptions of multifactorial ANOVA?

What are the assumptions of multifactorial ANOVA?  normal distribution homogeneity of variance (one way of telling = similar SD values, e.g. 3.97, 3.73, 3.09, 4.22) If more than two conditions in any of the within-factors IVs, have to check whether the assumption of sphericity has been violated: Mauchley's test of sphericity Sphericity= an assumption of within participant's ANOVA If Mauchley's test of sphericity 's sig is 0.5 or lower, then this means the assumption of sphericity is violated, use Greenhouse Geisser

Give all the sources of variance for: two way between ANOVA, factors A and B and three way between ANOVA, factors A,B,C

Give all the sources of variance for: two way between ANOVA, factors A and B and three way between ANOVA, factors A,B,C Main effect A Main Effect B Interaction A and B Error Main effect A Main effect B Main effect C Interaction A and B Interaction A and C Interaction B and C Interaction ABC Error (IDs. experimental errror) 

What are the sources of variance in MULTIfactorial ANOVA?

What are the sources of variance in MULTIfactorial ANOVA? -Some variance attributable to the IV's (their main effects) -Some variance attributable to their interaction effects (e.g. Beatles produced music together they never could have produced alone, whole is greater/different than the sum of its parts) -Error variance (experimental error, individual differences (except IDs are also partitioned in repeated/within ANOVA)

When is partial eta squared useful, and when is d useful?

When is partial eta squared useful, and when is d useful? Partial eta squared= global measure of magnitude of effect D= magnitude of difference between two conditions

What are the sources of variance in two way ANOVA? (main effects and interactions)

What are the sources of variance in two way ANOVA? (main effects and interactions) Variance due to Factor 1 Variance due to Factor 2 Variance due to the interaction between these factors Error variance ('within-groups' variance) Need to report the F value (with associated d of f and p value) for both of the "main effects" and the "interaction"

Define simple effect (sometimes called simple main effects). How would you calculate them?

Define simple effect (sometimes called simple main effects). How would you calculate them? THE SIMPLE PICTURE OF JUST MAIN EFFECTS If you do get a significant interaction, you can find out what is happening in each of your conditions by analysing the simple effects. Where you find a difference between simple effects, you have spotted an interaction! *Simple effects show the difference between any 2 conditions of 1 IV in one of the conditions of the other IV ** Simple effect analyses are equivalent to t-tests, but involve the calculation of F values, and you can get SPSS to calculate them for you, but this is v complex, so instead, use t-tests!! Simple effects= a comparison of two cell means of 1 IV, within one condition of another IV (see diag!) The more simple effects you calculate, the higher your family wise errors will be. You should therefore be selective in your simple effects calculations: Your hypothesis(prediction about how cell means will differ (based on pr...

What are the three main types of multifactorial ANOVA?

What are the three main types of multifactorial ANOVA? Independent/BETWEEN Groups ANOVA= the analysis of unrelated designs, a design where all factors contain independent samples, Repeated Measures/WITHIN Groups ANOVA= Only related factors are involved (repeated measures or matched pairs) Mixed Design ANOVA= ANOVA analysis where both unrelated and related factors are involved

How to abbreviate factorial ANOVA designs(and how to refer to them: x factors = x Way Anova)

How to abbreviate factorial ANOVA designs(and how to refer to them: x factors = x Way Anova) FxF 3x3= two factors, three levels of each factor 2x3x2 ANOVA 3 factors, two levels, three levels Two way ANOVA= compares effects of two factors on 1 DV - No matter how many levels there are in each factor, we will (at most) just find a main effect for each factor, and the interaction between them Three way ANOVA= compares effects of three factors on 1 DV 

What are the benefits of using factorial ANOVA designs?

What are the benefits of using factorial ANOVA designs?  Factorial designs can test the effect of 2 or more FACTORS on 1 DV at the same time. Enables us to find out if there is an INTERACTION between the two factors! 1)Two factor design moves one step closer to reality- testing the effects of two IVs on a DV simultaneously. [Manipulation of a single IV with all other variables held constant is criticised for its extreme separation from reality- in life we are affected by several influences together at any one time]. 2) By manipulating more than one factor in an experiment, we get to see the ways in which one factor INTERACTS with another E.g. comparing effects of caffeine on driving performance, against placebo, after five hours' sleep and after none. (2X2 DESIGN) 3) Use of two factors is often demanded by the research question, but often simply convenient- two experiments in one, plus the interaction effects. E.g. Coffee on driving, sleep on driving, and interaction...

What is a neural pathway?

What is a neural pathway? A neural pathway, neural tract, or neural face, connects one part of the nervous system with another via a bundle of axons, the long fibers of neurons. A neural pathway that serves to connect relatively distant areas of the brain or nervous system is usually a bundle of neurons, known collectively as white matter. A neural pathway that spans a shorter distance between structures, such as most of the pathways of the major neurotransmitter systems, is usually called grey matter.

What are a few ways you can handle missing values for a feature in your data?

What are a few ways you can handle missing values for a feature in your data? Answer: You can drop the data instance if there aren't a lot of them. Yout can replace with the mean variable in many instances. In a time series problem you might use a neighboring value. You could do some type of clustering algorithm and then use the average value for instances in the same cluster. EX: There are two columns in a health dataset, 'current weight' and 'heaviest weight ever'. There are values for every instance of the current weight, but many missing values for heaviest weight. One way to handle missing values is looking at the average percentage difference between heaviest weight and current weight and then applying that percentage difference to calculate the heaviest weight for missing values.

Explain the difference between bagging and boosting ensemble models.

Explain the difference between bagging and boosting ensemble models. Answer: Both methods are examples of ensemble methods that combine multiple models to create the final model. A bagging model (random forests) will create all of its component models without any information from the other models, and then work to aggregate them all together. Boosted models are created sequentially where it creates the first model then uses data from that model (usually the errors/residuals) to create the next model, and so on.

What are a few ways you can evaluate a linear regression model?

What are a few ways you can evaluate a linear regression model? Answer: R^2, RMSE, MAE, MAPE

What is selection bias, why is it important and how can you avoid it?

What is selection bias, why is it important and how can you avoid it? Answer: Selection bias is the term used to describe the situation where an analysis has been conducted among a subset of the data (a sample) with the goal of drawing conclusions about the population, but the resulting conclusions will likely be wrong (biased), because the subgroup differs from the population in some important way. Selection bias is usually introduced as an error with the sampling and having a selection for analysis that is not properly randomized. It can be avoided by taking a random sampling of the population and testing to make sure that the subgroup looks like the population along many measures (age, gender, education).

You find your random forest model is overfitting the data. What can you change about your model to reduce this.

You find your random forest model is overfitting the data. What can you change about your model to reduce this. Answer: I would reduce the number of features available for each tree. This will make your trees more diverse and less likely to overfit to a particular feature. Additionally increase the minimum sample leaf size will prevent you from creating a tree that is too highly fit to the data

What is the difference between a left, inner, and outer join?

What is the difference between a left, inner, and outer join? Answer: A outer join will combine all rows into a table, even if one instance is not present in one of the tables. An inner join will leave you will only data instances that were originally present in all combined tables. Finally a left join will only have instances in the original table even if they aren't present in the additionally joined tables.

What are two models you can use for classification problems, and when would you use one instead of the other?

What are two models you can use for classification problems, and when would you use one instead of the other? Answer: Logistic regression and Random Forests are two models that can be used for classification. Often I would try both models and see which one performs better. If the output of the model must be easily interpretable, I would use the logistic regression over the random forest because the model outputs coefficients that I can interpret.

What are the basic assumptions to be made for linear regression?

What are the basic assumptions to be made for linear regression? Answer: The assumptions of linear regression are, (1) linear association between input and output variable (2) normally distributed errors and (3) independence of error term with input

How would you explain a linear regression to a business executive?

How would you explain a linear regression to a business executive? Answer: Linear regression models are used to show or predict the relationship between two variables or factors. The factor that is being predicted (the factor that the equation solves for) is called the dependent variable. The factors that are used to predict the value of the dependent variable are called the independent variables. You can use linear regression to predict continuous variables (salary) taking into account variables that explain (education, experience, occupation). You may have heard something along the lines of "Women in the US earn 77% of what men earn, but if you account for different factors like experience, occupation, etc., that number becomes 91%."

What does P-value signify about the statistical data?

What does P-value signify about the statistical data? Answer: The P value, or calculated probability, is the probability of finding the observed, results when the null hypothesis (H0) of a study question is true. In layman's terms, it is saying how likely is it that my results are actually significant or is it possible that they are the result of random sampling.

What is the difference between Supervised Learning and Unsupervised Learning?

What is the difference between Supervised Learning and Unsupervised Learning? Answer: In supervised learning you you know what your target variable and your data set has labels for that variable. Therefore, the goal of supervised learning is to learn a function that, given a sample of data and desired outputs, best approximates the relationship between input and output observable in the data. Unsupervised learning, on the other hand, does not have labeled outputs, so its goal is to infer the natural structure present within a set of data points.

What is the goal of A/B Testing?

What is the goal of A/B Testing? Answer: This is a statistical hypothesis testing for randomized experiments with two variables, A and B. The objective of A/B testing is to detect any difference in the outcomes between the A and B group. For example, if you want to test out a new landing page to see if it leads to more sales, you would set up an A/b test where half of the visitors see the old page and half of the visitors see the new page. Then you use a statistical test to see if the actions of those visitors were different.

What is a recommendation engine? How does it work?

What is a recommendation engine? How does it work? Answer: Recommendation engines basically are data filtering tools that make use of algorithms and data to recommend the most relevant items to a particular user. Or in simple terms, they are nothing but an automated form of a "shop counter guy". You ask him for a product. Not only he shows that product, but also the related ones which you could buy. There are three main types: - Collaborative Filtering - Content-Based Filtering - Hybrid Recommendation Systems

Explain cross-validation, both the process and why you do it.

Explain cross-validation, both the process and why you do it. Answer: Cross-validation is an effective tool to measure the accuracy of your model and check to see if it is underfitting or overfitting. In addition, it is useful to determine the hyperparameters of the model. You will use cross validation to determine which parameters will result in lowest test error. It does this by splitting your data into multiple groups, then training your model some of the groups and validating it on another group.

What is the curse of dimensionality?

What is the curse of dimensionality? Answer: As you increase the number of dimensions in your feature space the less effective standard computational and statistical techniques become. Your models will require more computational power to be fitted and more observations of data. When fitting a model, you make certain assumptions that the data sample is representative of the population. The more features you have,relative to the data instances, the less confidently you can say that the assumptions

What is regularization and what kind of problems does regularization solve?

What is regularization and what kind of problems does regularization solve? Answer: Regularization is used to help prevent you from overfitting your model. It does this by introducing a penalty term for the size of the coefficients in your model.

Please explain the bias-variance tradeoff.

Please explain the bias-variance tradeoff. Answer: The bias-variance tradeoff is essentially a questions of how complex you would like to make your model. The more complex your model, the more likely you model can vary based on the sample of data. This would be high variance and you could be overfitting your model. While a simpler model, reduces the likelihood of this, it increases the chance of you underfitting your model and making it bias towards the features selected for your model.

What are various steps involved in an analytics project?

What are various steps involved in an analytics project? Answer: - Look at the big picture - Get the data - EDA - Data Prep (cleaning and feature engineering) - Select a model - Fine-tune your model (test metrics and hyperparameter tuning) - Present your solution - Launch, monitor, and maintain your system.

Why did you switch careers to become a data scientist?

Why did you switch careers to become a data scientist? Answer: 30 Second Elevator Pitch