Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Multiple Pages get rank for one query
Hi to all experts, In google search console , I've found that multiple pages get rank for my main query. How can I solve that? aoi59
Reporting & Analytics | | tarahshiiid0 -
Query on google analytic event report and customized report - Please help.
For my site I placed Black Friday offer.Now I have highlighted one product at homepage banner and tracking that with event. I am sharing with you actual figures of 1 day from Google analytic. Your proper response on this query will really really really helpful to me for understanding google analytic deeply and properly. So pls give your precious time for answering me, I will be thankful to you from bottom of my heart. I am checking event report from behavior, following is report for by that product event - Sessions - 30
Reporting & Analytics | | pragnesh9639
Unique Events - 30
Total Events - 34
Ecommerce Conversion Rate - 3.33% I am checking now report from All pages via Behaviour section - Page views - 97, Unique Page Views - 62, Entrances - 7, Bounce Rate - 57.14%, %Exit - 17.53% Now I create customize report for the same by adding
session -7
Ecommerce conversion rate - 0
unique events -4 MY query - when I added session in customized report then it is different from Event session why (7 and 30) ? same way Ecommerce conversion rate in customized report showing 0% why ( 0% and 3.33% ) ? same way unique events 4 and 30 ? do you suggest what is the best way to analysis such page or what will be best customize report for such analysis? Thanks0 -
When I click on organic search, the biggest drop is on keyword "not provided". What does this mean?
I am trying to identify the reason for the drop in organic search.
Reporting & Analytics | | Sable_Group0 -
Is there a way to map your on-page SEO changes with the organic growth?
Hi Mozzers, I was just wondering if there's a way we can map our on-page SEO changes with the increase/decrease in organic traffic. For instance, I introduced brand pages' link the product page breadcrumbs and suddenly organic traffic for my brand pages increase from X to 2X in 1 couple of weeks. Now, this can be because of this breadcrumb change purely or because of some algorithm update or may be, bots started finding the content interesting and hence, started ranking them up (in case the brand pages were launched recently). So, you can't say which change should be mapped to what increase/decrease in organic traffic. Or, is there a way to map this?
Reporting & Analytics | | _nitman0 -
What is the best way to embed Google Analytics charts on our site?
We want to build out this functionality so a client can log into our site and view data. I know its possible but I can't find any articles about the steps so that my team can move forward.
Reporting & Analytics | | appbackr0 -
Paid Search Referral
I have a brand new site with a paid search referral in my G&A, but we are not running any adwords or any paid marketing for it. The referral is "not set" so I do not know where it is coming from.
Reporting & Analytics | | KJ-Rodgers0 -
What is the difference between "Organic Traffic" and the "Non-Paid Search Traffic" default segment in Google Analytics?
These two filtering options ("organic traffic" in the left sidebar and "non-paid search traffic" in the advanced segments) give me slightly different numbers. Any idea why this would be the case?
Reporting & Analytics | | FPD_NYC1 -
What's the best way to track moble traffic / sales from a subdomain in Google Analytics?
Hello, I've been trying to figure out the best way to track traffic and sales for a subdomain mobile site and would appreciate any feedback or guidance. I've set up an advanced segment but have found that I'm not able to source traffic since the segments get disabled when trying to view top content pages. I've also setup an additional sub-profile under the main domain profile in google analytics and have set up an advanced filter using the following guildelines http://code.google.com/apis/analytics/docs/tracking/gaTrackingSite.html In the Profile Settings page, click the Add Filter link. Choose Add New Filter and provide the filter a name. Choose Custom Filter and select Advanced on the Filter type settings. Under Advanced settings: FieldA should be set to Hostname FieldB should be set to Request URI Set the values for both Field A and Field B to (.*), which is an expression that captures all characters. Set the Output To --> Constructor option to Request URI and provide $A1$B1 as the value for that choice. I'm unsure if I've set up the profile / filter correctly, the traffic stats are different than the segment. The mobile profile is also tracking all ecommerce transactions instead of just mobile. Here's the code I have in place. we're using the Traditional code Thanks, Jamie
Reporting & Analytics | | marketing_zoovy.com0