Stata's expertise lies in the analysis of time based data. Stata provides not only the basic time series models like ARIMA but even the multivariate equivalents (VAR/VEC-Models) as well. Further you are able to model volatility using GARCH-models in Stata. Kaplan-Meier-curves are the way to analyse survival times, while mixed models help to analyse panel data. A mighty scripting language completes the package.
Stata produces all kinds of classical statistics. You can use it for descriptive statistics, hypothesis testing and visualization of data. Typically Stata is used in research and development. The large amount of different statistical methods helps scientists in all fields of applications (Social science, econometrics, epidimiology, medical research).
No matter if you are a student or a senior researcher, there is always the right version of Stata available: Stata/BE, Stata/SE and Stata/MP
Arguments for Stata:
- Used in research and development
- Wide range of statistical and graphical methods
- Comprehensive statistical software
- Flexible and especially powerful for analysis of time series
- Easy to learn but mighty scripting language
Recommended products
Stata BE
Stata SE
EViews 14
Stata/MP
Stata statistical software is a complete, integrated statistical software package that provides everything you need for data analysis, data management, and graphics. Stata is not sold in modules, which means you get everything you need in one package.
Easy to learn yet fully programmable for the most demanding data management and statistical requirements.
With Stata's menus and dialogs, you can easily point and click or drag and drop your way to all of Stata's statistical, graphical, and data management features. You can completely reshape your data, create group-level variables for panel or longitudinal data, graph a receiver operating characteristics (ROC) curve or impulse-response function (IRF), perform a case-control analysis, estimate a random-effects count-data model or a Cox proportional hazards model, or compute marginal effects from a nonlinear estimator. You can even access the dialog boxes for each command directly from the online help system. T his is a great way to explore all of the capabilities of Stata.
Stata Software is available in 3 different flavours
Whether you’re a student or a seasoned research professional, we have a package designed to suit your needs:
- Stata/MP: The fastest version of Stata (for quad-core, dual-core, and multicore/multiprocessor computers) that can analyze the most data
- Stata/SE: Stata for large datasets
- Stata/BE: Stata for mid-sized datasets
- Numerics by Stata: Stata for embedded and web applications
Stata/MP is the fastest and largest version of Stata. Virtually any current computer can take advantage of the advanced multiprocessing of Stata/MP. This includes the Intel i3, i5, i7, i9, Xeon, and Celeron, and AMD multi-core chips. On dual-core chips, Stata/MP runs 40% faster overall and 72% faster where it matters, on the time-consuming estimation commands. With more than two cores or processors, Stata/MP is even faster. Find out more about Stata/MP.
Stata/MP, Stata/SE, and Stata/BE all run on any machine, but Stata/MP runs faster. You can purchase a Stata/MP license for up to the number of cores on your machine (maximum is 64). For example, if your machine has eight cores, you can purchase a Stata/MP license for eight cores, four cores, or two cores.
Stata/MP can also analyze more data than any other flavor of Stata. Stata/MP can analyze 10 to 20 billion observations given the current largest computers, and is ready to analyze up to 1 trillion observations once computer hardware catches up.
Stata/SE and Stata/BE differ only in the dataset size that each can analyze. Stata/SE and Stata/MP can fit models with more independent variables than Stata/BE (up to 10,998). Stata/SE can analyze up to 2 billion observations.
Stata/BE allows datasets with as many as 2,048 variables and 2 billion observations. Stata/BE can have at most 798 independent variables in a model.
Numerics by Stata can support any of the data sizes listed above in an embedded environment.
All the above flavors have the same complete set of features and include PDF documentation.
Product features | Stata/BE | Stata/SE | Stata/MP |
Maximum number of variables | 2,048 | 32,767 | 120 |
Maximum number of observations | 2.14 billion | 2.14 billion | Up to 20 billion |
Maximum number of independent variables | 798 | 10,998 | 10,998 |
Multicore support (Time to run logistic regression with 5 million obs and 10 covariates ) | 1-core/ 10.0 sec | 1-core/ 10.0 sec | 2- core (5.0 sec), 4-core (2,6 sec), 4+ core (even faster) |
Complete suite of statistical features | Yes! | Yes! | Yes! |
Publication-quality graphics | Yes! | Yes! | Yes! |
Matrix programming language | Yes! | Yes! | Yes! |
Complete PDF documentation | Yes! | Yes! | Yes! |
Exceptional technical support | Yes! | Yes! | Yes! |
Includes within-release updates | Yes! | Yes! | Yes! |
64-bit version available | Yes! | Yes! | Yes! |
Windows, macOS, and Linux | Yes! | Yes! | Yes! |
Memory requirements | 1 GB | 2 GB | 4 GB |
Disk space requirements | 1 GB | 1 GB | 1 GB |
* The maximum number of observations is limited only by the amount of available RAM on your system.
Stata scripting language
Stata's scripting language is easy to learn and helps you to get the most out of your data. It allows not only to use and modify the existing routines to generate standard reports, but can easily be extended with newly created statistical functions.
Efficient Datamanagent with Stata
Datamanagement with Stata is easy and efficient. Joining datasets, creating new variables or producing summary tables is done in no time.
Professional Graphics with Stata
Stata provides professional graphics that can directly be used for documents and publications. This includes not only pre-defined standard graphs but although highly customizeable graphics.
Further Information:
https://www.stata.com/why-use-stata/
Trialversion of Stata
The producer provides a free 30-day trialversion on their website. The trialversion contains all the features of Stata. You can register for this license simply by visiting the following link: http://www.stata.com/customer-service/evaluate-stata/
Compatible operating systems
Stata will run on the platforms listed below. While Stata software is platform-specific, your Stata license is not; therefore, you need not specify your operating system when placing your order for a license.
running Stata on a dual-core, multicore, or multiprocessor computer.
Platforms
- Windows 10 *
- Windows 8 *
- Windows Server 2019, 2016, 2012 *
* Stata requires 64-bit Windows for x86-64 processors made by Intel® or AMD
- Mac with Apple Silicon or 64-bit Intel processor
- macOS 11.0 (Big Sur) or newer for Macs with Apple Silicon and macOS 10.12 (Sierra) or newer for Macs with 64-bit Intel processors
- Any 64-bit (x86-64 or compatible) running Linux
- For xstata, you need to have GTK 2.24 installed
Hardware requirements
Package | Memory | Disk space |
---|---|---|
Stata/MP | 4 GB | 2 GB |
Stata/SE | 2 GB | 2 GB |
Stata/BE | 1 GB | 2 GB |
Stata for Linux requires a video card that can display thousands of colors or more (16-bit or 24-bit color)
What's new in Stata?
Tables
Customize your tables of
- Summary statistics
- Results from hypothesis tests
- Regression results
- LR and Wald tests, GOF statistics, ...
- Results from any Stata command
Export to
- Word, Excel
- LaTeX
- HTML, Markdown
- and more
Bayesian econometrics
Bayesian
- VAR models
- IRF and FEVD analysis
- Dynamic forecasting
- Panel/longitudinal-data models
- Linear and nonlinear DSGE models
PyStata—Python and Stata
- Call Python from Stata.
- Call Stata from Python.
- Exchange data, metadata, and results seamlessly.
- Use Stata from Jupyter Notebook, Spyder, PyCharm IDE, and more.
Jupyter Notebook with Stata
- Invoke Stata and Mata from Jupyter Notebook.
- Easily reproduce your work and collaborate with others.
- Access results from Stata analyses within Python.
- Stata output, graphs, and tables seamlessly integrate with your Jupyter Notebook.
Difference-in-differences (DID) and DDD models
- Evaluate the effect of a policy, a treatment, or an intervention.
- Control for confounding unobserved group and time characteristics.
- Use panel data or repeated cross-sections.
- Use DID. In vogue since 1855.
Faster Stata
Stata is fast, and keeps getting faster.
- Faster sort and collapse
- Faster mixed models
- Faster estimation commands
- Faster import delimited
- And more
Interval-censored Cox model
You want to model time to an event.
But you don't know the exact event times—only the intervals in which events happen.
And you don't want to make parametric assumptions.
Try an interval-censored Cox model.
Multivariate meta-analysis
Do you have multiple effect sizes?
Do they share a common control group?
Do they share the same group of subjects?
Multivariate meta-analysis can help.
Bayesian VAR models
You fit your VAR models with var.
You fit your Bayesian regression models with bayes:.
Now fit your Bayesian VAR models with bayes: var.
Bayesian multilevel modeling
Nonlinear, joint, SEM-like, and more.
More multilevel models.
More powerful.
Easier to use.
Treatment-effects lasso estimation
When you want:
Causal inference, average treatment effects, potential-outcome means, double-robust estimation
And you have:
Many (maybe hundreds or thousands of) potential covariates
Use treatment-effects estimation with lasso variable selection.
New functions for dates and times
- Calculate durations, such as ages and other differences between datetimes.
- Calculate relative dates, or dates from other dates, such as the previous or next birthday or anniversary relative to a reference date.
- Extract individual components from datetime values and variables.
Leave-one-out meta-analysis
Are there influential studies in your data?
Use leave-one-out meta-analysis to find out.
Galbraith plots
Graphically summarize meta-analysis results
- Study-specific effect sizes
- Precision of effect sizes
- Overall effect size
Detect potential outliers
Assess heterogeneity
Panel-data multinomial logit model
You can model categorical outcomes with mlogit.
You can model panel data with xt.
Now you can do both!
Stata's new xtmlogit command models categorical outcomes that change over time.
Bayesian panel-data models
Bayesian analysis lets you answer probabilistic questions with panel-data models.
- How likely is it that an extra year of schooling will increase wages?
- What is the probability of default for a low-risk portfolio?
Incorporate prior knowledge, see posterior distributions of random effects, compute Bayesian predictions, and more.
Zero-inflated ordered logit model
Need to model an ordinal outcome?
Have excess zeros (or responses in the lowest category)?
ziologit is the answer.
Nonparametric tests for trend
Do responses have an increasing or decreasing trend? Find out using one of four nonparametric tests for trend:
- Cochran–Armitage test
- Jonckheere–Terpstra test
- Linear-by-linear test
- Cuzick's test with ranks
Bayesian IRF and FEVD analysis
What is the effect of a shock over time?
What is the mean or median of the effect for a distribution of probable scenarios?
Bayesian IRF analysis answers these and more.
Bayesian dynamic forecasting
After VAR, you want a dynamic forecast.
After Bayesian estimation, you want statistics of posterior distributions.
Estimate both. Visualize both.
Lasso with clustered data
Your data have ...
many variables.
Your data have ...
clusters of observations.
Your lasso for prediction, model selection, or inference can now select variables while accounting for clustering.
BIC for lasso penalty selection
Which variables should lasso include?
BIC for lasso penalty selection can tell you.
Bayesian linear and nonlinear DSGE models
Forming rational expectations
of the future is hard.
DSGE models include
these expectations.
Prior information helps.
Do-file Editor enhancements
- Persistent bookmarks
- Navigation Control
- Syntax highlighting for Java, XML, and more
- Auto-completion for quotes, parentheses, and brackets
Stata on Apple Silicon
- Native M1 processor support
- Universal application for both Intel and Apple Silicon Macs
- One license, both kinds of hardware
Intel Math Kernel Library (MKL)
Mata functions and operators use heavily optimized LAPACK routines underpinned by the Intel Math Kernel Library.
Use your favorite Stata commands like always; underlying functions are faster, so you get results faster.
Java integration
- Use Java interactively (like JShell) from within Stata.
- Embed Java code in do-files.
- Embed Java code in ado-files.
- Compile and execute Java code "on the fly" without external programs.
H2O integration
- Start a new H2O cluster or connect to an existing one.
- Manipulate data on an H2O cluster.
- Access the capabilities of H2O directly in Stata.
JDBC
Connecting Stata to databases is now easier.
Want to access data from Oracle, MySQL, Amazon Redshift, Snowflake, Microsoft SQL Server, and others?
Use jdbc.
Want one driver that works on Windows, Mac, and Linux?
Use jdbc.
Spatial autoregressive modelsBecause sometimes where you are matters. |
Nonlinear multilevel
|
Mixed logit models: Advanced choice modelingDo you walk to work, ride a bus, or drive your car? Which of three insurance plans do you buy? Which political party do you vote for? We make dozens of choices every day. Researchers have access to gaggles of data about those choices. Mixed logit introduces random effects into choice modeling and thereby relaxes the IIA assumption and increases model flexibility. |
Nonparametric regressionWhen you know something matters. But have no idea how. |
Create Word documents from Stata
|
Bayesian multilevel modelsSmall number of groups? Consider Bayesian multilevel modeling. |
Threshold regressionYour time-series regression may change parameters at some point in time or at multiple points in time. The activity of foraging animals might follow a completely different pattern at temperatures above some threshold. You may not know the value of that threshold. Finding such thresholds and estimating the parameters within the regimes is what threshold regression does. |
Panel-data tobit with random coefficientsStata has long had estimators for random effects (random intercepts) in panel data. |
Search, browse, and import FRED dataThe St. Louis Federal Reserve makes available over 470,000 U.S. and international economic and financial time series. You can now easily search, browse, and import these data. |
Multilevel regression for interval-measured outcomesIncomes are sometimes recorded in groupings, as are people's weights, insect counts, grade-point averages, and hundreds of other measures. Often we have repeated measurements for individuals, or schools, or orchards, etc. So ... we need multilevel regression for interval-measured (interval-censored) outcomes. |
Multilevel tobit regression for censored outcomes
|
Panel-data cointegration tests
|
Tests for multiple breaks in time series
|
Multiple-group generalized SEMGeneralized SEM now supports multiple-group analysis. Easily specify groups and test parameter invariance across groups. GSEM models include
|
ICD-10-CM/PCS
|
Power for cluster randomized designsPower analysis for comparing
when you randomize clusters instead of individuals |
Power for linear regression models
|
Heteroskedastic linear regression
|
Poisson models with sample selectionCounts are common. How many: Fish did you catch?
Accidents occurred? Patents does a firm generate? Outcomes are not always seen. Folks evade the game warden.
Accidents are not always reported. Some firms prefer trade secrets to patents. So you need Poisson models with sample selection. |
More in panel dataNonlinear models with random effects, including random coefficients Bayesian panel-data models Interval regression with random intercepts and random coefficients |
More in graphicsTransparency in graphs SVG export |
More in statisticsBayesian survival models Zero-inflated ordered probit Add your own power and sample-size methods Bayesian sample-selection models And yet more |
More in the interfaceStata in Swedish Stata in Chinese Improvements to the Do-file Editor |
And, even more
Stream random-number generator Improvements for Java plugins
The whole feature list you will find under the following link:
https://www.stata.com/features/
Stata Features
Data management
data transformations, match-merge, ODBC, XML, by-group processing, append files, sort, row–column transposition, labeling, saving results
Basic statistics
summaries, cross-tabulations, correlations, t tests, equality-of-variance tests, tests of proportions, confidence intervals, factor variables
Linear models
regression; bootstrap, jackknife, and robust Huber/White/sandwich variance estimates; instrumental variables; three-stage least squares; constraints; quantile regression; GLS
Multilevel mixed-effects models
generalized linear models;continuous, binary, and count outcomes; two-, three-, and higher-level models; random-intercepts; random-slopes; crossed random effects; BLUPs of effects and fitted values; hierarchical models; residual error structures; support for survey data in linear models
Binary, count, and discrete outcomes
logistic, probit, tobit; Poisson and negative binomial; conditional, multinomial, nested, ordered, rank-ordered, and stereotype logistic; multinomial probit; zero-inflated and left-truncated count models; selection models; marginal effects
Longitudinal data/panel data
random and fixed effects with robust standard errors; linear mixed models, random-effects probit, GEE, random- and fixed-effects Poisson, dynamic panel-data models, and instrumental-variables regression; panel unit-root tests; AR(1) disturbances
Generalized linear models (GLMs)
ten link functions, user-defined links, seven distributions, ML and IRLS estimation, nine variance estimators, seven residuals
Nonparametric methods
Wilcoxon-Mann-Whitney, Wilcoxon signed ranks and Kruskal-Wallis tests; Spearman and Kendall correlations; Kolmogorov-Smirnov tests; exact binomial CIs; survival data; ROC analysis; smoothing; bootstrapping
Exact statistics
exact logistic and Poisson regression, exact case-control statistics, binomial tests, Fisher's exact test for r × c tables
ANOVA/MANOVA
balanced and unbalanced designs; factorial, nested, and mixed designs; repeated measures; marginal means; contrasts
Multivariate methods
factor analysis, principal components, discriminant analysis, rotation, multidimensional scaling, Procrustean analysis, correspondence analysis, biplots, dendrograms, user-extensible analyses
Cluster analysis
hierarchical clustering; kmeans and kmedian nonhierarchical clustering; dendrograms; stopping rules; user-extensible analyses
Resampling and simulation methods
bootstrapping, jackknife and Monte Carlo simulation; permutation tests
Tests, predictions, and effects
Wald tests; LR tests; linear and nonlinear combinations, predictions and generalized predictions, marginal means, least-squares means, adjusted means; marginal and partial effects; forecast models; Hausman tests
Graphics
line charts, scatterplots, bar charts, pie charts, hi-lo charts, regression diagnostic graphs, survival plots, nonparametric smoothers, distribution Q-Q plots
Survey methods
multistage designs; bootstrap, BRR, jackknife, linearized, and SDR variance estimation; poststratification; DEFF; predictive margins; means, proportions, ratios, totals; summary tables; regression, instrumental variables, probit, Cox regression
Survival analysis
Kaplan-Meier and Nelson-Aalen estimators,; Cox regression (frailty); parametric models (frailty); competing risks; hazards; time-varying covariates; left- and right-censoring, Weibull, exponential, and Gompertz analysis
Epidemiology
standardization of rates, case–control, cohort, matched case-control, Mantel-Haenszel, pharmacokinetics, ROC analysis, ICD-9-CM
Time series
ARIMA; ARFIMA; ARCH/GARCH; VAR; VECM; multivariate GARCH; unobserved components model; dynamic factors; state-space models; business calendars; correlograms; periodograms; forecasts; impulse-response functions; unit-root tests; filters and smoothers; rolling and recursive estimation
Multiple imputation
nine univariate imputation methods; multivariate normal imputation; chained equations; explore pattern of missingness; manage imputed datasets; fit model and pool results; transform parameters; joint tests of parameter estimates; predictions
Simple maximum likelihood
specify likelihood using simple expressions; no programming required; survey data; standard, robust, bootstrap, and jackknife SEs; matrix estimators
Programmable maximum likelihood
user-specified functions; NR, DFP, BFGS, BHHH; OIM, OPG, robust, bootstrap, and jackknife SEs; Wald tests; survey data; numeric or analytic derivatives
Other statistical methods
kappa measure of interrater agreement; Cronbach's alpha; stepwise regression; tests of normality
Programming features
adding new commands; command scripting; object-oriented programming; menu and dialog-box programming; Project Manager; plugins
Matrix programming-Mata
interactive sessions, large-scale development projects, optimization, matrix inversions, decompositions, eigenvalues and eigenvectors, LAPACK engine, real and complex numbers, string matrices, interface to Stata datasets and matrices, numerical derivatives, object-oriented programming
Internet capabilities
ability to install new commands, web updating, web file sharing, latest Stata news
Accessibility
Section 508 compliance, accessibility for persons with disabilities
Sample session
A sample session of Stata for Mac, Unix, or Windows.
User-written commands
User-written commands for meta-analysis, data management, survival, econometrics
Graphical user interface
menus and dialogs for all features; Data Editor; Variables Manager; Graph Editor; Project Manager; Do-file Editor; Clipboard Preview Tool; multiple preference sets
Graphics
line charts; scatterplots; bar charts; pie charts; hi-lo charts; contour plots; GUI Editor; regression diagnostic graphs; survival plots; nonparametric smoothers; distribution Q-Q plots
Documentation
20 manuals20 manuals; 11,000+ pages; seamless navigation; thousands of worked examples; methods and formulas; references; 11,000+ pages; seamless navigation; thousands of worked examples; methods and formulas; references
Power and sample size
power; sample size; effect size; minimum detectable effect; means; proportions; variances; correlations; case-control studies; cohort studies; survival analysis; balanced or unbalanced designs; results in tables or graphs
Treatment effects
inverse probability weight (IPW); doubly robust methods; propensity score matching; regression adjustment; covariate matching; multilevel treatments; average treatment effects (ATEs); average treatment effects on the treated (ATETs); potential-outcome means (POMs)
SEM (Structural equation modeling)
graphical path diagram builder; standardized and unstandardized estimates; modification indices; direct and indirect effects; continuous, binary, count, and ordinal outcomes (GLM); multilevel models; random slopes and intercepts; factors scores, empirical Bayes, and other predictions; groups and tests of invariance; goodness of fit; handles MAR data by FIML; correlated data
Functions
statistical; random-number; mathematical; string; date and time
Embedded statistical computations
Numerics by Stata
Contrasts, pairwise comparisons, and margins
compare means, intercepts, or slopes; compare to reference category, adjacent category, grand mean, etc.; orthogonal polynomials; multiple comparison adjustments; graph estimated means and contrasts; interaction plots
GMM an nonlinear regression
generalized method of moments (GMM); nonlinear regression