Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Query on Product List Position After doing filters
Hi All, I have implemented enhance ecommerce with tag manager now I am implementing Product List Position so I have query that if visitor do filter or do sorting via highest or lowest price at that time do we have to show List position too? If yes then how and if No then why?
Reporting & Analytics | | Arnold3
Thanks!Mitesh0 -
Multiple Pages get rank for one query
Hi to all experts, In google search console , I've found that multiple pages get rank for my main query. How can I solve that? aoi59
Reporting & Analytics | | tarahshiiid0 -
Google Analytics Question - Impressions & Queries Up, Sessions Down
I'm working with a client who, according to the Google Query report, impressions and sessions are up since we've started work with them about 6 months ago, but Google sessions are down. In moz, we're seeing a gradual, but steady increase in search visibility specifically with Google. Note: this is all organic. From when we started tracking queries, the first month we were tracking there were 43,581 impressions and 690 click throughs for the month. This past month there were 98,293 queries and 1015 clicks throughs for the month (granted not year over year data) - of these 1,015 clicks, 995 of them were from web. However, for those same time periods, sessions from Google are down over 30% - 1,750 vs. 1,189. I'm not sure how to interpret this. I realize that clicks and sessions are not a straightforward comparison, but I would think that if clicks were up according to the query report that sessions would also be up. Is it that some of these clicks are bouncing and therefore not being tracked as a session? Is there a potential issue with how data is being tracked?
Reporting & Analytics | | Corporate_Communications0 -
I want to upgrade to Universal Analytics but already using GTM and I have few queries...
Hello Experts, I want to migrate from classical to universal analytics and i am already using GTM for my ecommerce site. I have small 6 queries :- After login to google analytics from admin section i have to transfer property to universal analytics ? right? I have to wait either 24 hours or 48 hours before retagging or doing any changes in UA. So how can i know my property transferred sucessfully? After property transferred to universal analytics, i have to configure the session timeout and campaign timeout periods via the Google Analytics Admin page. ( By default,
Reporting & Analytics | | jackinmathis1
sessions end after 30 minutes and campaigns end after 6 months ) is it okay if i don't change this settings? As of now in my analytic i have configured google adwords, google webmaster and google merchant. In analytic i have also set custom alerts, goals, funnels, enchance link attribution, eCommerce etc but google analytics code i have already added in google tag manager, so i have to some manual changes for all such things? or all the things i.e goals, funnel, alerts etc will be transferred automatically at time of transferring the property? In Google tag manager i have already configured following things :- Tags :- Name - Google adwords Conversion Tracking b) Type - Adwords Conversion Tracking c) Rules - order sucessful page Name - Google Analytic Page view Tracking b) Type - Classic Google Analytics c) Rules - All pages Name - Google Analytic Conversion Tracking b) Type - Classic Google Analytics c) Rules - order sucessful page Name - Twitter Conversion Tracking b) Type - Custom HTML tag c) Rules - order sucessful page So at time of transferring property all this will be transferred automatically? Also at thank you page of my website I have scripts for goal setting, Affiliator, so there is no relation of this with migration right? Kind co-operation in the matter of highly be appreciated. So please help me! Regards,
Jackin!0 -
I have few similar job forms that were created for different positions. SEOMoz says, its "duplicate pages". So how do I resolve it? I want my jobs to be searchable in Search Engines.
Hi There, I have few similar job forms that were created for different positions. SEOMoz says, its "duplicate pages". So how do I resolve it? I want my jobs to be searchable in Search Engines. Thanks !
Reporting & Analytics | | pointstar0 -
Is there a way to use Google Analytics event tracking for YouTube embedded videos on my website?
Hello! I am trying to track different types of interaction on the YouTube for videos which are embedded on a website to measure the interaction on these video from my organic traffic. Is there a good way to go about this with the code since these are coming from an iFrame on YouTube? Would appreciate any feedback or help on implementing the event tracking with YouTube videos. Many thanks!
Reporting & Analytics | | CabbageTree0 -
Should I delete a page that gets search traffic, that I don't care about?
I have a page on my site that consistently gets traffic, every month. Googlers seems to love it. But I don't like it at all. Webmaster tools shows that google allows us a certain number of search impressions each day. - it flatlines, they are limiting the impressions we get. We also getthe same number of clickthroughs each day. So my question is for anyone who has this same experience, who may have experimented by deleting a page you don't care about. Did you just lose that number of clicks each day or did other pages on your site get displayed and clicked through instead?
Reporting & Analytics | | loopyal0 -
Google Search Results inconsistent from different computers
Recently after some optimization activities - I do not see much movement in search rankings - my client is seeing the results on page 1 position 3 and I see page 2 for the same keyword. How does Google change ranking based on past searches and how can I get an accurate picture of what the actual rank is?
Reporting & Analytics | | devonkrusich0