## In the spotlight: Using Python within Stata

Did you know that you can use Python within Stata? Well, you can—you can use Python from within Stata to access data with Internet API calls, process JSON data, create three-dimensional graphics, use machine learning and artificial intelligence algorithms to find patterns in your data, and much more. And the best part is that you can seamlessly move between Stata and Python to perform tasks such as these right alongside the analyses that you already perform using Stata.

I recently started writing a series of blog posts to help you get started using Python from within Stata. The first post shows you how to set up Stata to use Python, and the other posts help you get started with Python and then demonstrate interesting ways you can use it. Before you read the blog posts, I want to show you one way that you can take advantage of the Python integration in Stata. Here I use Stata to estimate marginal predictions from a logistic regression model and then Python to create a three-dimensional surface plot of those predictions.

I first fit a logistic regression model using data from the
National Health and Nutrition Examination Survey (NHANES). The dependent
variable, **highbp**, is an indicator for high blood pressure, and I included
the main effects and interaction of the continuous variables **age** and
**weight**.

. webuse nhanes2, clear. svy: logistic highbp c.age##c.weight(running logistic on estimation sample Survey: Logistic regression Number of strata = 31 Number of obs = 10,351 Number of PSUs = 62 Population size = 117,157,513 Design df = 31 F( 3, 29) = 418.97 Prob > F = 0.0000

Linearized | ||

highbp | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval] | |

age | 1.100678 .0088786 11.89 0.000 1.082718 1.118935 | |

weight | 1.07534 .0063892 12.23 0.000 1.062388 1.08845 | |

c.age#c.weight | .9993975 .0001138 -5.29 0.000 .9991655 .9996296 | |

_cons | .0002925 .0001194 -19.94 0.000 .0001273 .0006724 | |

The estimated odds ratio for the interaction of **age** and **weight**
is 0.9994, and the *p*-value for the estimate is 0.000. Interpreting this result
is challenging because the odds ratio is essentially equal to the null
value of 1, but the *p*-value is essentially 0. Is the interaction effect meaningful? It is often easier to determine this
by looking at the predicted probability of the outcome at different levels
of the covariates rather than looking only at this odds ratio.

The code block below uses
margins
to estimate the predicted
probability of hypertension for all combinations of **age**
and **weight** for values of **age** ranging from 20 to 80
years in increments of 5 and for values of **weight** ranging
from 40 to 180 kilograms in increments of 5. The option
**saving(predictions, replace)** saves the predictions to a
dataset named **predictions**. I then use the dataset
**predictions**, rename three variables in the dataset, and
save it again.

webuse nhanes2, clear svy: logistic highbp age weight c.age#c.weight quietly margins, at(age=(20(5)80) weight=(40(5)180)) vce(unconditional) saving(predictions, replace) use predictions, clear rename _at1 age rename _at2 weight rename _margin pr_highbp save predictions, replace

In a previous *Stata News* article,
I used
graph twoway contour
to create a contour plot of the predicted probability of hypertension saved in
**predictions**. Today, I'm going to use Python to create a three-dimensional
surface plot of the predicted probability.

You can run the Python code below in a Stata do-file after you have
set up Stata to use Python. The Python code block begins with
python: and
ends with **end**. I have included comments in the Python code to give you
clues about the purpose of each collection of Python statements. You can
read a detailed explanation of the Python code at the
Stata Blog.

python: # Import the necessary Python packages import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt matplotlib.use('TkAgg') # Read (import) the Stata dataset "predictions.dta" # into a pandas data frame named "data" data = pd.read_stata("predictions.dta") # Define a 3-D graph named "ax" ax = plt.axes(projection='3d') # Specify the axis ticks ax.set_xticks(np.arange(20, 100, step=10)) ax.set_yticks(np.arange(40, 220, step=40)) ax.set_zticks(np.arange( 0, 1, step=0.2)) # Specify the title and axis titles ax.set_title("Probability of Hypertension by Age and Weight") ax.set_xlabel("Age (years)") ax.set_ylabel("Weight (kg)") ax.zaxis.set_rotate_label(False) ax.set_zlabel("Probability of Hypertension", rotation=90) # Specify the view angle of the graph ax.view_init(elev=30, azim=240) # Render the graph ax.plot_trisurf(data['age'], data['weight'], data['pr_highbp'], cmap=plt.cm.Spectral_r, edgecolor='none') # Save the graph plt.savefig("Margins3d_3.png", dpi=1200) end

The resulting three-dimensional surface plot shows the predicted
probability of hypertension for values of **age** and **weight** and allows us to visualize
their interaction. The
probabilities are indicated by the height of the surface on the *z* axis
and by the color of the surface. Blue indicates lower probabilities of
hypertension, and red indicates higher probabilities of hypertension.

All the Stata commands and Python code in this example can be run in the same do-file. And three-dimensional surface plots are just one of the many fun things you can do with Stata and Python.

Ready to try it for yourself? My blog posts will show you how to get started:

- Setting up Stata to use Python
- Three ways to use Python in Stata
- How to install Python packages
- How to use Python packages
- Three-dimensional surface plots of marginal predictions
- Using APIs and JSON data with Python
- Machine learning and artificial intelligence with Python

Stay tuned into the Stata Blog for more posts on exciting ways to use Python within Stata. And even better, explore ways to take advantage of Stata/Python integration in your own research.

— Chuck Huber

Director of Statistical Outreach