How to Write Python Scripts to Analyze JSON APIs and Sort Results



hey there how's it going everybody so I was putting together a tutorial on how to use homebrew which is a package manager for the Mac operating system and while I was doing that video I was looking through the packages that were available in homebrew so that's the page that I have open here and they have all of the packages that are available on their core repo right here through the browser with a little description of each one and if I click on one of these packages then it takes me to a page with more details about that package now one interesting piece of information here if I scroll down a bit is the analytics where they have the total number of installations for certain time frames so they have the total installations for the last 30 days the last 90 days and the last 365 days so I started wondering if there was a way that I could find out what their most popular packages were now I didn't see any kind of sorting functionality on their site so I didn't think that it existed it did exist but I'll get to that in just a second so since I didn't think that this information existed I figured that it would be a good idea for a real-world project where we could build a solution to this on our own so I actually went out and created a script that sorts these packages by their popularity manually but after I did that I discovered that they actually have this information available if you look hard enough for it so that is the page that I have open here so on their analytics page here they have pages for their most installed packages and again I found this pretty hard to find that's why I didn't know it existed before writing my own script to this it's not linked anywhere on the page with their available packages and on their website it's at the very bottom closed down close to their footer but it does exist so if I click on their install requests for the last 30 days for example so I'll do the install on request events here for the last 30 days by the way I believe that the install on request means that the package was explicitly installed and wasn't installed as a dependency or anything like that so that's what I'm most interested in so we can see the most installed packages for the last 30 days on this page here so like I said I wrote a script to do this manually without knowing that this existed and I figured that I could still make a video showing how I got this information manually and then we can compare it to their version here to make sure that it's working properly so if you're trying to learn something then I definitely think it's a good idea to try to come up with your own solutions to problems like these anytime you run into them even if they solution already exists not only is it good practice but you're also solving a problem that you're actually interested in and you're going to feel a bit more sense of pride after you've done this as well so let me show you how I came up with my own solution to these most popular packages based solely on what they have in their package information listed over here so like I said in their list of packages here I didn't see any sorting functionality or a way to get to the most installed packages but I did see that they had some JSON API s that contains some information so we can see on this page for this specific package that they have a JSON API for this specific package so I'm going to open this up in a new tab and the URL for this is ford slash api forged slash formula forge slash the name of the package JSON so let me make this a little larger here now this isn't formatted so it's a bit jumbled together and difficult to read but this JSON has the analytics available from that for this specific package so if I hit ctrl F to do a search and then search for 30-days then we can see that we have the 30 days for the installs the 30 days for the installs on request and the 30 day total for the build errors now like I said a bit ago I think the difference between install and install or in requests is that installing requests is a package that is installed explicitly and doesn't count you know being installed as a dependency so that's what I'm gonna be after in this video ok so this is the JSON for this particular package but we want to compare the analytics of all the packages so let's go back to the main page here so let me make this a little smaller here I'm gonna go back to their full list of packages and now that we're back on their main page for all of their packages in their core repository we can see that there's also a JSON API here as well so this is probably for all of the packages so let me open this up and I'm gonna open this up in a new tab as well and this takes a little longer to load since there are more packages okay so this is a large JSON file here that contains information on all of the packages but it doesn't contain the analytics now let's create a new Python script so that we can get begin to analyze some of the data that they provide here and so that we can read this a little bit better so what I'm gonna do is I'm gonna copy the URL to this JSON here for all of these packages so I'm gonna copy that and I have a blank Python script open up here in sublime and this is a script that I just created in a folder on my desktop so for this script I'm gonna be using the request library now I already have a request installed but if you don't have it installed then you can install it with a simple pip install so once you have request installed I'm just going to import that so I'll import request and now let's grab the contents from that JSON URL that we copied from the browser so to do this I can simply say R is equal to request dot git and I want to get that URL for all of the packages for that JSON and once we get a response from that URL we can parse out that JSON simply by saying let's say I'll call this packages underscore JSON I'll set that equal to R which is the request R dot JSON now if you're not familiar with how to use the request library then I do have a detailed video on how to use these requests and responses if you're interested and I'll leave a link to that video in the description section below in this video I'm going to focus more on solving the problem rather than going into deep details about each step okay so now that we have that large JSON object that we got from their API so we don't really know what information that has on each package just yet so let's go ahead and take a look now I could just print this out but it would be all jumbled together just like we saw in our browser so let's see what that looks like so I'll say print and I'll just print that package is JSON without any formatting so if I run that then it's going to take a while to print all of this information out here but if I make this a little larger then we can see that this is all just jumbled together here okay so that's pretty hard to read so let's clean this up a bit now when working with JSON data we can use the JSON module to dump this to a string and we can tell the string how we want it to be formatted so to do this let's import the JSON module so I will import JSON here at the top and now I'm just going to overwrite my print statement here and now I'm going to say packages underscore string is equal to and then I'll do json dot dump s which will dump this object here into a string and the json that i wanted up is packages json and i also want to do an indent equal to two and that will make the formatting a bit better so now i'm going to print that packages underscore string and i'll save that and run it and now this should print out in a more formatted response now i also have a detailed video on working with json data if you'd like to see more details about what we're doing here so i'll also leave a link to that video in the description section below as well okay so when we run that we can see that it's formatted and a bunch much better and more readable way now so I can tell from the ending square bracket at the end here in our output that this JSON is a list of values and it looks like this is a list of all of the packages let me make this a bit larger here so that we can see this a bit better so if we look at the information for one package here then we can look through the information that provides but it doesn't look like which see we have a name full name aliases we have a description but if I keep scrolling down it's got different URLs for them and stuff like that but I don't see anything for analytics and then this specific package runs out right here and then it moves on to the next package with name full name description but I don't see any analytics now since I know that this JSON is a list of packages let me simplify this a bit just by dumping the first package from that list to our string so if I make this a little smaller here instead of dumping all of these packages from this packages JSON here since I know this is a list of packages I can just jump that first index which should be the first package so if I save that and run it then if we can see from the output here that this is no longer list it's just a single object and that object is just going to be the first package that we got back from that JSON and this package is actually the one that we opened up in the browser because it was at the top of the list okay so now we have a list of the available packages but still no analytic data but we know that each package has its own JSON with the analytic data because we saw that in the browser so let's go back to the browser and look at the naming convention of the API for each package so let me open up the browser here I'm going to close down these two here and I'm going to click on a specific package again so that we can see what a the JSON looks like for that specific package so I'm going to open this up and let me copy this and now I'm going to paste that URL into sublime so that we can read it a bit better so I'm just going to paste it down here at the bottom as a comment okay so this URL that we have here was what gave us the JSON for all of the packages and the URL that I just grabbed and pasted here is the JSON for this specific a to PS package and this JSON like we saw earlier does have the analytics data for that specific package so let's look at how the URL is formatted so we have the base of the URL here and then we have forged slash API ford slash formula and then Forge slash the name of the package a two PS followed by a dot JSON so if I look at down here at the information in the JSON that we got back from all of the packages that does contain the package name here so maybe we can use that package name to generate our own URL to then grab the JSON specific to a particular package so let me show you what I mean so I'm going to grab the package name from this first package simply by saying underneath the package JSON here I'm going to say package underscore name is equal to packages JSON and I'm going to access that first package and now I'm gonna access the name key from that first package and now let's generate a URL by using the naming convention that we just got from the browser so I'm just going to take the URL that we paste it in here and turn it into a string so I'm going to uncomment it and I'm going to highlight it and put it within single quotes to turn it into a string let me also copy that and paste it up here and now I'm going to replace this package name with our package name variable so to do this I'm going to use an F string which allows us to put variables directly into our string so I'm going to say package underscore URL is equal to and then I'm going to put an F to indicate that this is an F string and now where we have the package name here in the URL I'm going to get rid of the hard-coded one that's actually in there now and I'm going to instead replace that with our package name that we got from the JSON data here so now let's make a request to that URL that we just generated and see if we get the analytics data for this single package so to do this I'm just going to say r is equal to request dot get and I want to get this package URL and then I can just say I'll call this package underscore JSON instead of packages underscore JSON I'll say package underscore JSON is equal to r dot JSON and that is the response JSON from the request made to that single package URL and now I'm going to take the same logic that we use to print the formatted data out before and use this with our new JSON for this particular package so I'm going to call this package string instead of packages string and also what the data that I want to dump to a string I want this to be packages JSON or I'm sorry package JSON and I don't need to access any index because it's not a list of values it should just be the JSON for that particular package so if I save that and run it then it looks like we did get a response back if I scroll up to the top here the name of the package that we got is that a to PS so we did get the right information and if I scroll down here towards the bottom then we can see that we have the analytics here so we have the analytics for the installations for the past 30 90 and 365 days and we also have the analytics for the install on request so it looks like we did get the information that we want for that one package okay so now let's look at how we would access those analytics specifically so I'm gonna make this a little larger here and just scroll up a little bit so if we look at this dictionary then it's within a key named analytics and then we want the analytics for the number of installs on request so with an within analytics here we have a key called install on requests and then with install on requests we have keys for 30 D which is 30 days 90 D which is 90 days and 365 D which is 365 days and within each key for those days there is another key with the package so in this case it's a two PS and then the value of that key contains the number of installations on request okay so we've got a couple of layers to dig down through here but let's see how we grab those values so I'm gonna make this a little smaller again and I can scroll up to where we have that analytics key so I'm going to overwrite a print statement here and instead I'm going to try to get the installs for 30 90 and 365 days so first I'll do installs underscore 30 is equal to and that's going to be equal to package.json which is the dictionary of this JSON data and now I want to access the analytics key and within that analytics key there's another dictionary and I want to access the install underscore on underscore request key so install on request key and any time you need to know where you need to go you can check down here in your output so we have analytics we have install or request and now we want the 30 days so that's a key of 30 d so I'll now access that key of 30 d and now within 30 d we have another key that is the package name now we actually have the package name in a variable so instead of hard-coding that I'll just copy package name and paste that in there and now the value of that key should be the information that we're after that is the number of installs on request for the last 30 days now let me just copy this logic here and I'm going to use this for the installations for the last 90 and 365 days as well so I'll call this variable installs underscore 90 and here instead of the 30 D key I'm going to access the 90 D key and here this last one I'll call installs underscore 365 and I will access the 365 D key there so so far we have the package name and the analytics data for three different date now I think I also want to grab the description of this package so that we know what the package is meant to do so what peer underneath the package name I'm going to copy this and paste under here I'm gonna also grab the package description so this will be package underscore I'll just say DSC and the key that we want to access let's see that was DSC if you forgot what that was you could go back and look at that JSON but I've got it written down here that that's what that key is for the description so yeah if if you forget that you can always go back and look at the original JSON okay so now let's print out all of this data that we have so far to make sure it all looks correct for this single package so I'm going to print out the package name the package description I'm also going to print out the installs for 30 installs for 90 and installs for 365 now before I run this the values that we should get for 39 day and 365 should be 109 324 and 1410 that we can see in the output currently so I'm going to save this and run it and if I run this then we can see that we get the package name this is the description here and then we got 109 324 1410 so that's what we expected okay perfect so now we have the information that we want for this one package so now let's see if we can do it for all of the packages now this is going to require me to go to a lot of different URLs to see how many URLs that this is going to try to go to we can print the length of our packages JSON list that we have here at the top so let me comment out this stuff here that we have at the bottom underneath the package packages JSON up and I also need to comment these out as well I want to comment everything below the package is JSON here and let me print how many packages we're going to need to analyze and to do that I will just print the le in which will give me the link of this packages list here so if I save that and run it then we can see that there are four thousand seven and 18 packages now the way that we're doing this right now that means that we're gonna have to make requests to the individual Jason's for 4718 packages now if anyone has ever watched my video on web scraping I mentioned that it's not very considerate to hammer a server with a lot of different requests at one time so some api's even may have limits on how often they'll allow you to make a request I didn't see any limits in homebrews API documentation but it's still nice to be as courteous as possible now these are pretty lightweight JSON responses that will be requesting show it so it shouldn't put too much strain on their site but I still think it would be a good idea to just put in a very slight delay before each request so that it's not hammering these requests one after another so to do this I'm going to sleep between each request and for the amount of time that I'm going to sleep for I'm just going to use the amount of time that it took me to get a response from the site that way if the responses start slowing down then it's just going to make our program sleep longer between requests so that we're not slamming their API so let's see how we can do this so first we want to loop over all of the packages so in order to do this I'm going to overwrite this print statement here I'm just going to say for package in packages JSON since that is a list of all of the packages and now I'm going to reuse all of this logic here let me get rid of the output for now so that we can see all of this I'm going to uncomment out all of this and I'm going to just reuse all of this logic and indent this in our for loop and now I need to make a couple of changes here so now that I'm within this for loop we are accessing the information for this package variable instead of packages underscore JSON so right here where I'm accessing the first package from that list instead since we're in a loop now I'm just going to access the name for that specific package and also the description for that specific package so we changed those two lines there now I also want to let's see this line looks good this line looks good this one's good we're no longer using this as to test anymore so we can just get rid of the packaged string and now all of this information is information that should apply to all of these packages now now before I put in any sleep statements or anything like that I'm just going to see if this looks like it's going to work so this should print out all of the install analytics for the different package packages right now so I'm just going to run this for a second and then immediately kill it after I get a few responses now if you're following along with this video then please don't follow along with this part if you don't know how to kill your program because otherwise it will go out and make those 4700 requests and we aren't even capturing that information in any way right now so it would be a long wait for nothing so I'm gonna run this quickly as a test and then I'm just going to immediately kill it so I will run this and I can see some responses coming in so I'm going to kill it with ctrl C okay so it looks like we were getting that information for different packages we have the different package names here we have the descriptions here and then we have the installs for the last 30 90 and 365 days okay so that's perfect so now we want to capture this information in some way and also put in that sleep between each request now I think the best way to capture this information will be to create a list of dictionaries and then we can save that entire list of dictionaries to a file on our own machine so you might be wondering why would I want to save this list of dictionaries to my own machine when I could just analyze the data on the spot without saving a file well the reason I want to do that is because you know maybe you'll want to revisit the data or analyze it in a different way after we've downloaded at once and it will be much easier and faster to analyze that single file on our own computer then it would be to go out and request those 4700 JSON files each time we want to look at something in a different way so for now our main goal is to capture this information and save it to our own custom JSON file so to do this above our for loop I'm going to create an empty list and I'm going to call this empty list results and I'll just set this equal to an empty list okay and now within our for loop let me close the output again here so that we can see within our for loop after we've grabbed the installations for the last 30 90 and 365 days we'll put that into a dictionary and let's make this a Mista nested dictionary so first we'll have keys for the package name and description so I'll say data is equal to and we want this to be a dictionary so first I'll have a key for the name and the value I will just have as package name and I'll put in a comma and now we want a key for the description so I'll just call this DSC just like they had it and I will put in the package description and now for the analytics let's just use the same keys and values from the day that we pulled down from their API so I will have another key here and I will call this analytics and for the value I'll make this another dictionary and this dictionary will have a key for 30d which will be our installs of the last 30 days we also want I'm just going to copy this line here to make this go faster we also want a key of 90 D for the installs of the last 90 days and lastly we want a key for 365 D and those will be the installations of the last 365 days now again in this video I'm analyzing the installs on requests but you can analyze whatever you want with this script so instead if you would if you would also like to add the analytics for the regular installs then you could grab that data from the original JSON you could also grab the built number of build errors and things like that anything that you want to add to this it's completely customizable so I'm just using the install zone request for the last there and 365 days okay and I can take out that last comma there since we don't have any more keys after that okay so now that we have that information for that single package let's append that data to our results list so we're still here within the for loop I'm going to say results dot append and I want to append this data to that results list and we'll go over this one more time in just a second so that we're sure that we understand everything that's going on so far but for now let me go ahead and add in that sleep delay after we've appended that data and that will make our program sleep for just a little bit before it advances the loop and makes the next web request so to do this I'm going to import the time module since that's how we do sleep so I'm going to import time and now after the we appended the data to our results I'm just going to say time dot sleep and I want to sleep for the amount of time that it took me to get the response that way if the responses start slowing down for any reason let's say it takes five seconds to get a response back then that would mean that our program would start sleeping five seconds before we make make the next response so it's just a way to kind of give a buffer and to take it easy on the server if it starts to look like it's slowing down for any reason so to do this the response object for requests I actually have a nice built-in function that does this for us so this response is just set to R here so I'm going to say that we want to sleep for our dot elapsed and elapsed is a time delta so with time delta we can just do total underscore seconds and that is a method so we want to put parentheses there to execute total seconds okay so now let's do a quick recap here so that we understand everything that we've done so far so I'm going to come up here so we have our empty results list and then we have our for loop here where we're looping over all of the packages that we got back from that initial request and we are you know getting information from those packages such as the package name and the description and then we are making a request to the URL for that specific packages JSON file that contains the analytics and then we are parsing out the information that we want from that specific JSON file and then we are creating a dictionary and putting in our own custom information here so we're setting the name the description and the analytics for that one specific package and then we are appending that one specific package to our results list so after this for loop finishes we should have a list of all of the packages with all of this information here so to show you what this looks like and also to test that this is working the way that we expect it to so far I'm going to add a break statement underneath our time dot sleep now what that's going to do is it's going to run through our for loop one time and after it runs through that first time it's going to hit that break statement and the break statement will then tell Python just to break out of the for loop and continue on with the code after the for loop so once we break out of that for loop let's print out the results or let's print out the results list to see what it looks like at that time so I'm going to get rid of this print statement here and now I'm going to go down a couple lines here I'm going to uninvent here back to the main level and that will take us out of that for loop and I'm just going to print out the results list so if I save that and run what we have so far then we can see that right now we just have a list that contains that first package and it just contains the first one because we put that break statement in the for loop there now once we take that break statement out then our list will contain that information for all of those packages but before we take that break statement out let's go ahead and add the logic for saving that list of dictionaries to a JSON file so that we can make sure that it works with what we have now so notice as I'm developing this script I'm building this up to the final result a little bit at a time and I'm always doing it with small chunks of test data so if I were to just take out the break statement and write up the code for saving the JSON file without testing it first then it could spend a ton of time downloading all of that data only to crash at the very end and that would just unnecessarily waste a lot of our time when we would when we could have just tested it with a little bit of data first okay so to do this to go ahead and write this JSON file we can simply I'm going to remove this print statement so I'm going to say with open to open a new file and I will just call this package underscore info dot JSON and we want to open this in write mode and I will just call this F and within this context manager I'll say json dot dump and we use dump when we're dumping to a file we use dump s when we're dumping to a string so we're going to do json dot dump we want to dump that results list we want to dump that to F which is our open file and we also want to put in an indent equal to two here and that will just make our file nice and formatted so if I save this and run it then that should create a package info JSON file in our current directory so by current directory I mean the directory where this current script lives so let me open this up so in the sidebar here I have the directory open where I'm running this script and we can see that there is a package underscore info dot JSON so if I open that up this is what got written to that file okay so we can see that this file looks good this is what we expected so now we can try to remove that break statement from our for loop and it should go through all of those packages and add them to our JSON file now before I run this I'm also going to put in some print statements so that we can keep track of where we currently are and I'm also going to time how long it takes to go through all of these URLs so first I'm going to put in a print statement that gives us some feedback that things are still going well so I will before the break statement here I'll say print and I'm going to print out an F string and I'm just going to say that we got the package name and then I'll also say how long it took so we got the package name in the total second so I'll say r dot elapsed time dot total seconds there and then after that placeholder I also say seconds okay and now I'm also going to time the entire loop so up here before our for loop I'm gonna say t1 is equal to time dot perf underscore counter this is just a way to get accurate timings in Python so I'm going to copy that and then after the loop I'm gonna say and this is outside of the loop I'm gonna say t2 is equal to time dot perf counter and then we can just print out before we write the JSON file I'll print out an F string and I'll say finished in and we will do t2 minus t1 seconds so what this is doing here is we're starting a counter or a timer before the for loop let's say it takes 20 minutes to go through that for loop then we're measuring the time afterwards so then we're going to say t2 minus t1 if it took 20 minutes then it will give us 20 minutes in seconds there okay now finally I'm going to remove the break statement from our for loop and run the script so hopefully this will go out and grab everything that we wanted and then save it to our JSON file so let me take out the break statement there and let me save this and run it and see if this looks like it's working okay so we can see that it's going out and getting the data for these individual packages and it looks like this is working nicely now this is going to take a while because there are you know so many packages to go through to get the data that we want so I'm going to pause the video and then I'll pick this back up once this is finished and we'll see how long this took okay so I pause the video and I let that script finish and it looks like it finished in about 1,400 seconds so let's see if I open up my calculator that is about fourteen hundred divided by sixty that's about twenty four and a half minutes that it took to go out and pull down the JSON of all of those package files now if we hadn't put in the sleep statement then it probably would have done it and about half that time because we were essentially doubling the time that it took since we were sleeping for the amount of time that each response the time that took for each response but we can see that we didn't get any errors or anything so that's a good sign so now we should have all of that data saved into our JSON file in the same directory as our current script so let me check that out and so I'll reopen this package info JSON and if we look at this file we can see that it looks like we have all of this information for all of these packages so this is looking pretty good so we have each package name we have the descriptions and we have the analytics for each one so now we can actually do what we first set out to do and use this data to determine what the most popular or homebrew packages are so to do this I'm going to create a new script and I'll just create this in the same directory that we are currently in so I'll say new file and I'm gonna call this let's just call this popular dot pi or something like that and within this file I'm going to first load that JSON file that we just generated so to do that I'm going to import JSON and to open that file I'll say with open and that was package underscore info dot JSON and we want to open that in read mode we can put that in explicitly if we want but read is the default so either way you want to do that so I'll say as f so f is our Open File there now I'll say data is equal to JSON load when we want to load JSON data from file we use load and I want to load from F which is our JSON file okay so if I print this data then it should now be a Python list of all of our dictionaries so let's print that out to check so I'll say print and I want to be out of that with statement there so print data so if I save that and run it then we can see that this looks good it is a list of all of that data okay so now all we have to do is sort this list but we're going to need to write a custom sorting function because Python doesn't know what this data is or how we want it sorted so we need to tell it how we want this sorted now I have a separate more detailed video on how to do custom sorting so I'll leave a link to that video in the description section below but for this example I'll just show you how we can sort this specific list so first I'm going to create a function that will be what we want to use to sort the list so I'm going to remove this print statement here now I'm going to create this sorting function at the top of this file let me close the output here so we can see a little bit better so I'm going to create a function here and I'm just going to call this function install sort and we want to pass in a package as an argument to this function and now we need to return the value that we want to sort on so depending on whether we want to sort by 30 90 or 365 days we'll have to return that value so first let's sort by 30 day installs so I'll just say return package and we want to access the analytics key of each package and we want to return the thirty-day key value from that package and again if you don't know how these sorting functions work then definitely watch my more detailed video on sorting sorting objects so that you can know exactly what's going on here now this is actually going to sort these in ascending order with how we have this right now so packages that have zero installs will be at the beginning and packages with a lot of installs will be at the end now I'd rather see the packages with a lot of installs at the beginning but that's no problem we can simply say reverse equals true when we actually sort this so this function is finished up and now we can go ahead and sort this now we have two ways that we can sort this data we can either sort it in place using the sort method or we can keep the original data unsorted and capture the sorted data in a new variable using the sorted function now I'm just going to sort the data in place in this example so that means we don't have to create a new variable so in order to do this I'm just going to do this here at the bottom I'll say data dot sort and now we want to say key is equal to and this is going to be the function that we use to sort this data so I'm going to pass in the function of install sort now we want to pass in the function itself we don't want to execute the function so we're not putting parentheses here to execute the function we are leaving parentheses off and passing in the function itself and now we also want to set reverse equal to true because like I said this sorting function is going to sort in ascending order we want those in descending order so we'll say reverse is equal to true so now that data should be that should be a sorted list with the most installed packages of the last 30 days at the beginning of the list so let's print this sorted list out and I want this to be readable so I'm going to use the JSON module to dump this to a formatted JSON string just like we did earlier in this video so to do that I'm just going to say data underscore STR is equal to json dot dump s to dump this to a string and we want to dump that data list and I'm going to say indent is equal to two to make sure that that string is properly formatted now I will just print out that data string so if I save that and run it oops accidentally pasted something in there if I save that and run it now let me roll all the way up to the top here gonna have to use this over here then we should see the most installed packages of the last 30 days so we have node we have Python we have get W get yarn so that's pretty interesting and if we look at the 30-day totals so this is you know 250,000 here 254 okay so this was 255 254 142 140 so it does look like this is in descending order so that's good now let's compare this to their version on their website to see if this looks similar to what they have so this is the analytics for install on request for the last 30 days we have node Python get W get yarn so those are the same ones that we saw and for the numbers here we have 255 250 for 140 to 140 the results that we got in our custom script now I do think that the way that we build up our custom script does allow us to have some additional functionality that I don't think that they have in their online API for this analytics data so for example if I go back to their website here let me open up the browser again so here is their JSON API for their analytics data for the last 30 days so if I look at this let me make this a little bit larger here I know that this is probably hard for you to see but all they have here are the numbers so number one the rankings and the formula so node is number one they have the counts which we have as well but you can see here that they don't have a description for the package it's just the package name the count and the percentage of downloads so if we wanted to know the description of these packages then we'd have to go to them one by one but with our custom script that we built we do have these descriptions so that also allows us to that allows us to filter by descriptions as well so if I go back to my script let me make this page a little smaller here if I go back to my script then I could simply write a list comprehension here filter out the data based on a certain description so let's say that I only wanted packages that had the word video in the description so in order to do that I could say something like data is equal to and now we'll use a list comprehension and now I can simply say something like item for item in data if the text video let me spell that correctly is in the item description and that description key is des C is how we set it up now if you're unfamiliar with how list comprehensions work then I do have a separate video on those in detail as well and I'll be sure to link a link to that video also in the description section below but what this list comprehension is doing is it's just saying that we want all of the items in data our data list if the item has a description with video in that description so that's what that list comprehension is doing so if I save this and run it and now go up here to the top let me see we got still kind of a long list here so now we can see all of the packages for the last 30 days with video in the description so these are the top packages so we have ffmpeg play recorded convert and stream audio and video youtube-dl download YouTube videos from the command line you know vp8 vp9 video codec so that's pretty nice that we have this kind of functionality in our custom script here so if we wanted to get the top packages for the last 365 days with the word video in the item description and let's also just take the top 5 of these packages so if we wanted to just take the top 5 then we could use list slicing just to access those first 5 values so here where I'm dumping the data list to a string I can simply use list slicing and say that I only want up to the fifth item so if I save that and run it then now we're not going to have nearly as many items here we just have five or we should that's one two three four five okay and so the top packages for last year youtube-dl is the top one then ffmpeg media info and handbrake okay so that's pretty neat okay so now let me close down that output there okay so that's pretty much it for the manual script that I wrote to sort homebrew packages by popularity like I said I wrote this script before I knew that they had an API that already existed with most of this information but as we saw with being able to narrow down packages with a certain description this definitely has some uses that we can't get from their existing API or at least I don't think that it exists with their current API I could be wrong about that so all in all I definitely think that it was worth the time to write this up also as you build something like this you're gonna get some practice maybe using some tools that you haven't used in a while and it's always good to take on real-world projects like this in order to keep your skills sharp but with that said I think that it's going to do it for this video hopefully you found this interesting and learn some new tricks for solving problems like this that you might run into and projects like this can always be expanded further and further you know you could use matplotlib to graph these out in some cool way or analyze some other part of the data that we're getting from our file or anything like that but if anyone has any questions about what we covered in this video then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoy these tutorials and would like to support them then there are several ways you can do that the easiest ways to simply like the video and give it a thumbs up and also it's a huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute through patreon and there's a link to that page in the description section below be sure to subscribe for future videos and thank you all for watching you you

22 Replies to “How to Write Python Scripts to Analyze JSON APIs and Sort Results”

  1. We're covering a lot in this video. We're going to see how to request data from a JSON API, parse out the information we want, sort the data using a custom key, and a lot more. We're going to be using the Homebrew API, but you don't need to use Homebrew or a Mac to follow along. This can apply to many other APIs you'd like to analyze with Python.

    As I mention in the video, our final result does have a little more functionality than what we can get from their existing API, so I believe it was definitely a useful exercise. For example, we can filter popular packages by a keyword in the description, which is a nice edition. Hope you all find this helpful!

  2. Corey, another fantastic video. As always, i come away with so much education from your tutorials and i appreciate the work you put into these. One question, how can i take my sorted data and dump it into an Excel spreadsheet? Is it possible for you to pick up from the end of this video and show us the steps to put it into Excel? That would be very useful. Thanks again, Great Work

  3. Hi Corey. Great video, again! Do you usually post the code on github or somewhere for further reference? While following along your code I was doing work on some other project using similar patterns.

  4. Corey what's the best way to approach this: Getting a certain key value when there's a regex match on another key value? Any suitable videos?

  5. Amazing as always, Corey! Just curious how you would modify this to break up the request into a few parts, rather than making all requests and then storing the data at the end. Reason I ask is for slower connections if it errors out in the middle of then you lose everything that's been retrieved. My thinking is to use range() to break up the loop into smaller parts and then changing the subsequent writes to our package_info.json file to append new package data (rather than overwriting). Sound sensible? Anything I might be missing? Thanks as always for the amazing content!

  6. You might possibly get a key error for the 'path-extractor' package since under the analytics section the name of the package is "path-extractor –HEAD" not 'path-extractor'.

  7. got an error "#KeyError: 'path-extractor'" – tried for 5/10 min to find if google had a quick answer but no – made a try except.
    Great videos – keep it up 🙂

  8. Hey Corey, pls make a video on how to spam filter emails and also regular filtration. I need this to save time. Pls come to my aid

  9. Thanks Corey for another great video. I am following your tutorial using my Jupyter notebook. Unfortunately, the Jupyter Notebook server I am on stops sending out output (which is not to crash homebrew server). Can we simply download the file and then read it into your script? Please advise.

  10. Hey Corey, I'm a great fan of your tutorials. I had a request though, would you probably do a series on ReactJS?

  11. corey Thank YOU for your work. just a one little but huge request that may change my life or all others been through all basic beginner's courses. i hope to watch review like work through on great codes such as flask. courses like you read and explain to viewers how great piece of code should be in your way of teaching. So we can learn how to write beautiful code and how logic should be made for the world users. I sound might be crazy but it's been the answers i have been looking for all other online courses but could not found one yet. The reason i ask you instead of all other youtube coding teachers onn youtube is that you are one who cares us to understand right way, i believe. Thank you again.

  12. Hey Corey, any plans to do something like creating our own api with Django Rest Framework, would be great to see how you would approach it, just started using it myself to render charts with Chart.js based on the data in the database and it was very quick to get off the ground. Used your django project as the base for my project so thanks a mil for that. I'm sure I will make some mistakes along the way so would love to see how you do it.

Leave a Reply

Your email address will not be published. Required fields are marked *