Digital Humanities Questions & Answers » Tag: Visualization

Digital Humanities Questions & Answers » Tag: Visualization - Recent Posts http://digitalhumanities.org/answers/tags/visualization Digital Humanities Questions & Answers » Tag: Visualization - Recent Posts en-US Wed, 04 May 2016 09:42:11 +0000 http://bbpress.org/?v=1.0.2 <![CDATA[Search]]> q http://digitalhumanities.org/answers/search.php alangpike on "Big data projects using IMDB?" http://digitalhumanities.org/answers/topic/big-data-projects-using-imdb#post-2022 Fri, 07 Jun 2013 08:21:48 +0000 alangpike 2022@http://digitalhumanities.org/answers/ Thanks Ben, I'll report back after more investigation. Cheers. Ben Schmidt on "Big data projects using IMDB?" http://digitalhumanities.org/answers/topic/big-data-projects-using-imdb#post-2020 Thu, 06 Jun 2013 23:06:22 +0000 Ben Schmidt 2020@http://digitalhumanities.org/answers/ I've looked some at the <a href="http://www.imdb.com/interfaces">plain text data files</a> they provide, and was surprised at the amount of stuff one could do with them--they don't include reviews, but they do include most of the other stuff one might want--biographical information, fully standardized lists of cast and crew, etc.. And you can just download all the data straightaway without needing permission, and without having to write a crawler. There are some restraints on redistribution, but nothing that seems like it would be onerous for scholarly research. Social scientists have used it for some of the network stuff, eg <a href="http://sydney.edu.au/engineering/it/~dmerrick/papers/AhmedEtAl2007.pdf">here</a> and <a href="http://nwb.cns.iu.edu/papers/2007-herr-movieact.pdf">here.</a> I don't have any experience with the stuff that you can't get through the textfiles. alangpike on "Big data projects using IMDB?" http://digitalhumanities.org/answers/topic/big-data-projects-using-imdb#post-2019 Thu, 06 Jun 2013 08:42:29 +0000 alangpike 2019@http://digitalhumanities.org/answers/ I am in the early stages of my dissertation and, as I explore the possible DH methods that might apply to my work, am wondering if there might be a way for the DH community to take advantage of the mountain of data at IMDB. I have heard from some folks that they have been unfriendly to researchers, but thought it might be worth some serious effort to make some (all!) of their data, reviews, etc. available to film and media studies scholars. The possibilities for DH methods are virtually limitless. For my own work, data visualizations, social network analysis of cast and crew, etc. are all intriguing possibilities. Has anyone out there gotten permission from IMDB to use data crawlers for scholarly research? Has anyone tried to do so and been denied? Michael Widner on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1961 Thu, 04 Apr 2013 15:46:53 +0000 Michael Widner 1961@http://digitalhumanities.org/answers/ A quick tip for anyone doing similar work. The Python library NetworkX (<a href="http://networkx.github.com/" rel="nofollow">http://networkx.github.com/</a>) makes it very easy to create graph files that Gephi (and other programs) can read. For example, it will output a GEXF for you from nodes and edges that you create programmatically. You can set the colors, size, and other attributes so that you can have your data formatted for display in Gephi without all the manual work that sometimes requires. Scott Weingart on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1791 Tue, 13 Nov 2012 11:21:25 +0000 Scott Weingart 1791@http://digitalhumanities.org/answers/ Great to hear it, looking forward to the results. Ryan Cordell on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1790 Tue, 13 Nov 2012 11:00:05 +0000 Ryan Cordell 1790@http://digitalhumanities.org/answers/ Thanks so much for your help on this, Scott. The plugin method seems to have worked, though I still need to clean up the resulting graph: <a href="https://dl.dropbox.com/u/492930/Gephi%200.8.1%20beta%20-%20ChronAm-3000-2.gephi.png" rel="nofollow">https://dl.dropbox.com/u/492930/Gephi%200.8.1%20beta%20-%20ChronAm-3000-2.gephi.png</a> Scott Weingart on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1789 Mon, 12 Nov 2012 22:25:29 +0000 Scott Weingart 1789@http://digitalhumanities.org/answers/ So, this is a slightly more complicated problem than it ought to be. Instead of importing a network as an edge list, you have to import your data as separate node and edge lists, as described here: <a href="https://gephi.org/users/supported-graph-formats/spreadsheet/" rel="nofollow">https://gephi.org/users/supported-graph-formats/spreadsheet/</a> In the node list, you'll need to add node attributes (an extra column) that labels the 'type' of the node; whether it is a newspaper title or an ID. Once that network is loaded, you should be able to follow Shawn's steps. Ryan Cordell on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1787 Mon, 12 Nov 2012 15:58:53 +0000 Ryan Cordell 1787@http://digitalhumanities.org/answers/ Right, Scott, two sides of the same network: one with texts themselves as nodes and the other with newspaper titles as nodes. What I'm asking for help on is your recommendation: "create a bimodal network." Should I import my spreadsheet into Gephi as an edge graph, with the IDs as source and the Newspaper titles as target, and then use the plugin Shawn references to convert that graph to a 1-mode network? Scott Weingart on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1786 Mon, 12 Nov 2012 15:31:06 +0000 Scott Weingart 1786@http://digitalhumanities.org/answers/ I'm not completely clear what you're looking for by the descriptions, so let me try to re-word it. Do you mean you're looking for how pairs of reprinted texts co-occur based on which publications they share? And then you're looking for how pairs of newspaper titles connect to one another, based on which texts they share? So, two sides of the same network? If that's the case, the first thing you should do is create a bimodal network. That is, every edge goes from a newspaper title to a reprinted text. You can then follow Shawn's steps here: <a href="http://electricarchaeology.ca/2012/04/04/converting-2-mode-with-multimodal-plugin-for-gephi/" rel="nofollow">http://electricarchaeology.ca/2012/04/04/converting-2-mode-with-multimodal-plugin-for-gephi/</a> to create text-text networks or newspaper-newspaper networks. Ryan Cordell on "Automatically Preparing Edge/Node Data for Gephi" http://digitalhumanities.org/answers/topic/automatically-preparing-edgenode-data-for-gephi#post-1785 Mon, 12 Nov 2012 14:51:15 +0000 Ryan Cordell 1785@http://digitalhumanities.org/answers/ Okay, I've done some work with Gephi lately, but I find myself with a problem I can't quite solve. I work on reprinting networks, and thus far have generated network graphs from spreadsheets of reprinting with the original newspaper in one column (source) and reprinting newspaper in the second (target). Import edge table-->Gephi creates a pretty graph. I now have a much larger spreadsheet generated from a text-mining experiment I've started with a colleague in computer science. This spreadsheet includes for each found text: an ID number identifying a particular reprinted text (ex: 8679:5136:18458:8488:5042:872:3924:2547:21444) | Date of each reprinting | URL of source text | Name of each publication | City and State of Publication | Longitude of Publication | the text matched So there might be 10 lines with the same ID number--the "same text"--but different values in the other columns for each new reprinting of that text we found. I want to generate two opposite but complementary graphs from this data: 1.) in the first, the nodes would be Newspaper titles, and the edges would represent shared reprints--the ID field, I suppose. In other words, edges would be drawn between papers that reprinted the same text. Edges would be larger the more texts the two shared. I suspect there will be multi-stage process to prepare my data to do this, but I'm honestly not sure where to start. 2.) in the second, the nodes would be individual reprinted texts (the ID field for now, though we're working on generating titles) and the edges would be publications. Edges would be drawn between texts that appeared in the same newspaper. Any help you can offer would be appreciated. I can't find a way to do this in one step through Gephi, so I'm sure there's some data massaging ahead of me. Gaet86 on "What tools can be used to create topic model network graphs?" http://digitalhumanities.org/answers/topic/what-tools-can-be-used-to-create-topic-model-network-graphs#post-1674 Mon, 04 Jun 2012 11:21:37 +0000 Gaet86 1674@http://digitalhumanities.org/answers/ Hi boys, I have many 'topics model' create with Mallet's library, of this type: TOPIC 1 school 0.3 teacher 0.2 science 0.08 mathematics 0.07 matter 0.05 student 0.03 I want to generate a network, where each topic is a node. I tried to use Gephi, but I do not know how to import all topics into csv file. I gently ask if you can help. Thanks in advance... Gaetano rjlewis on "What tools can be used to create topic model network graphs?" http://digitalhumanities.org/answers/topic/what-tools-can-be-used-to-create-topic-model-network-graphs#post-1626 Wed, 02 May 2012 10:50:06 +0000 rjlewis 1626@http://digitalhumanities.org/answers/ Do you really mean that you want nodes for documents and topics, a <dfn>bimodal graph</dfn>? In that case your graph would have a small number of nodes (the topic nodes) with high centrality. And then thousands of small nodes (the document nodes) with low centrality. If this is the case, how are you calculating the topic weight for a document? It seems to make more sense to me to have nodes for just documents, and edges between documents that share a topic; a <dfn>multigraph</dfn>. Then the greater the number of edges between two nodes, the closer they are in topic. Or alternatively, you could define edge to be a function of the number of topics two documents have in common, which basically amounts to the same thing but alleviates the requirement to be able to represent multigraphs. As for tools to visualise this, here's some Perl which creates a GraphML from a list of documents titled A, B, C, D, E, F, G, and H which each cover one or more topics, 1, 2, 3, 4, 5, 6, or 7: <div class="bb_syntax"><div class="code"><pre class="perl" style="font-family:monospace;">#!/usr/bin/perl   use strict; use Graph::Easy;   my $graph = Graph::Easy->new; my $topics = {};   for (<DATA>) { my ($title, $topic) = split /,/;   my $document = $graph->add_node($title);   $graph->add_edge_once($document, $_) foreach (@{ $topics->{$topic} });   push @{ $topics->{$topic} }, $title or $topics->{$topic} = [$title]; }   print $graph->as_graphml; __END__ A,2 A,4 B,1 B,2 B,6 C,2 C,3 D,1 D,2 D,5 D,7 E,2 E,3 F,1 F,2 F,6 G,1 G,5 G,6 G,7 H,1 H,2 H,4 H,7</pre></div></div> I tried importing the output of this in Gephi and it looked basically correct. By the way, when you say "topic model", are your topics just keywords? Or are you talking about vectors of word frequencies? Lisa Rhody on "What tools can be used to create topic model network graphs?" http://digitalhumanities.org/answers/topic/what-tools-can-be-used-to-create-topic-model-network-graphs#post-1625 Wed, 02 May 2012 10:47:21 +0000 Lisa Rhody 1625@http://digitalhumanities.org/answers/ Shawn, Yes! That's what I was looking for. I'm sorry that I somehow missed it on your blog, but I'm grateful that you took the time to explain it here. For some reason I couldn't wrap my head around how the .csv file needed to be formatted to get it the way I wanted it in Gephi. I haven't tried it yet, but I'm about to. Thank you for the generous reply! -Lisa Shawn on "What tools can be used to create topic model network graphs?" http://digitalhumanities.org/answers/topic/what-tools-can-be-used-to-create-topic-model-network-graphs#post-1623 Wed, 02 May 2012 10:15:54 +0000 Shawn 1623@http://digitalhumanities.org/answers/ Hi Lisa, I've written about this sort of thing on my blog a few times - <a href="http://electricarchaeologist.wordpress.com/" rel="nofollow">http://electricarchaeologist.wordpress.com/</a> Take your topic modeling composition data. Create a spreadsheet where you have three columns, source, target, and weight. Put your docs and topics under source and target as appropriate, and then the percentage composition under weight. Save as a csv file. Then, in Gephi, create a new project. Click on 'data laboratory'. Click on 'edges' under 'data table'. Click 'import spreadsheet'. Navigate to your csv file. Make sure the 'as table' is set to edges table. click next, click finish. Then, go back to the 'overview' pane, and down the left hand side under layout you can select different algorithms that'll take the edge weight into account. ...is that the kind of thing you had in mind? You can also include a 'type' column in your csv file, with 'directed' or 'undirected' as appropriate. Lisa Rhody on "What tools can be used to create topic model network graphs?" http://digitalhumanities.org/answers/topic/what-tools-can-be-used-to-create-topic-model-network-graphs#post-1622 Wed, 02 May 2012 10:02:41 +0000 Lisa Rhody 1622@http://digitalhumanities.org/answers/ Replying to @<a href='/profile/parezcoydigo'>parezcoydigo</a>'s <a href="http://digitalhumanities.org/answers/topic/what-tools-can-be-used-to-create-topic-model-network-graphs#post-1621">post</a>: That tool looks fantastic because of its flexibility and because it can be worked right into the running of the model. Unfortunately, at this point I don't have the Python scripting ability to really use it right away. Do you know of something with a GUI interface with the same flexibility?