<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>Digital Humanities Questions &#38; Answers &#187; Tag: software - Recent Posts</title>
		<link>http://digitalhumanities.org/answers/tags/software</link>
		<description>Digital Humanities Questions &amp; Answers &#187; Tag: software - Recent Posts</description>
		<language>en-US</language>
		<pubDate>Wed, 19 Jun 2013 06:29:35 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://digitalhumanities.org/answers/search.php</link>
		</textInput>
		<atom:link href="http://digitalhumanities.org/answers/rss/tags/software" rel="self" type="application/rss+xml" />

		<item>
			 
				<title>Joel Kalvesmaki on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1913</link>
			<pubDate>Thu, 07 Mar 2013 14:22:39 +0000</pubDate>
			<dc:creator>Joel Kalvesmaki</dc:creator>
			<guid isPermaLink="false">1913@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;You could try indexing software such as &#60;a href=&#34;http://www.pdfindexgenerator.com/&#34; rel=&#34;nofollow&#34;&#62;http://www.pdfindexgenerator.com/&#60;/a&#62;. But it sounds as if the level of quality and detail to which you aspire would be best handled not so much by software but by hiring a professional indexer who already uses such software and can write a strong index in a timely manner. Of course, if you have more time than money, this may not be feasible.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>olaf on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1911</link>
			<pubDate>Wed, 06 Mar 2013 20:25:43 +0000</pubDate>
			<dc:creator>olaf</dc:creator>
			<guid isPermaLink="false">1911@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @&#60;a href='http://digitalhumanities.org/answers/profile/olaf'&#62;olaf&#60;/a&#62;'s &#60;a href=&#34;http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1910&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;The OCR idea seems to be a bust, unfortunately. It's a pain to convert to a &#34;flat&#34; file without renderable text. Then, even the newest version of Acrobat is finding it difficult to understand diacritics, italics and anything else non-standard. I think I'll be better off working with mistakes that follow a regular pattern (such as a≠ always equals ā) and working on a script or something to do mass replacements.&#60;br /&#62;
Don't know why Adobe doesn't allow OCR of files with renderable text in them. What could be the harm?
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>olaf on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1910</link>
			<pubDate>Wed, 06 Mar 2013 19:00:33 +0000</pubDate>
			<dc:creator>olaf</dc:creator>
			<guid isPermaLink="false">1910@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @Peter Organisciak's &#60;a href=&#34;http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1909&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;Thanks for the tips.&#60;/p&#62;
&#60;p&#62;Not automated. Just more convenient, and perhaps with some automated features to help with the actual index creation.&#60;/p&#62;
&#60;p&#62;I hadn't thought of running any OCR on the older files, since I made them many years ago from the original Word or Nisus files (i.e., they were never scanned or OCRed), but that's a great idea that I'm about to try. Don't know if OCR will ignore the text that's already 'live' though, or if I'll have to flatten them first.&#60;/p&#62;
&#60;p&#62;I'll definitely take a stroll through the research and see what I can find.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>Peter Organisciak on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1909</link>
			<pubDate>Wed, 06 Mar 2013 18:41:49 +0000</pubDate>
			<dc:creator>Peter Organisciak</dc:creator>
			<guid isPermaLink="false">1909@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @&#60;a href='http://digitalhumanities.org/answers/profile/olaf'&#62;olaf&#60;/a&#62;'s &#60;a href=&#34;http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1908&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;I believe you're looking for an automated way to create a back-of-the-book index, correct? 'Indexing' tends to refer to building indices for information retrieval (such as Terrier and Lucene's PDF parsers), which is why you couldn't find it on Google.&#60;/p&#62;
&#60;p&#62;Back-of-the-book indexes are tough to parse. Patrick Juola wrote about the need for such software and the technical challenges in &#60;a href=&#34;http://llc.oxfordjournals.org/content/23/1/73.full?sid=f01a5ee3-2477-4711-8fff-d42916eead6d&#34;&#62;Killer Applications for Digital Humanities&#60;/a&#62;. If I recall, he had early work in the area: I'm not sure what came of it. &#60;/p&#62;
&#60;p&#62;I don't know if there is any software that would do what you need. However, since it's a tough problem, you can be sure that researchers have tried it. Your best bet is to look through the research literature and see if any researchers have released their code. A scholar search for 'back-of-the-book indexing' along with keywords like 'unsupervised', 'semi-supervised', or 'automated' gave me some potentially useful articles. Still, you'd probably have to split the problem into two parts — parsing PDFs to text and generating an index — as I suspect there aren't any tools mature enough t include PDF parsing.&#60;/p&#62;
&#60;p&#62;To be honest, your approach of going through manually and highlighting notable terms sounds more tractable to me. With the OCR problems: have you tried re-applying text recognition on the older issues with the newest version of Acrobat Professional? Their OCR improves often.&#60;/p&#62;
&#60;p&#62;Sorry that I don't have a better answer for you. Good luck.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>olaf on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1908</link>
			<pubDate>Wed, 06 Mar 2013 18:21:41 +0000</pubDate>
			<dc:creator>olaf</dc:creator>
			<guid isPermaLink="false">1908@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;One more wish for the wishlist: a way to designate a term as fitting into more than one topic in the index. For example, al-Zahir Baybars would be indexed as himself and under &#34;sultans&#34;.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>olaf on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1907</link>
			<pubDate>Wed, 06 Mar 2013 18:09:23 +0000</pubDate>
			<dc:creator>olaf</dc:creator>
			<guid isPermaLink="false">1907@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @Dorothea Salo's &#60;a href=&#34;http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1905&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;I mean a real index, not a concordance. The need to leave out passing mentions is one of the reasons that no software will be able to automate the process.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>olaf on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1906</link>
			<pubDate>Wed, 06 Mar 2013 18:07:37 +0000</pubDate>
			<dc:creator>olaf</dc:creator>
			<guid isPermaLink="false">1906@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;One thing I've been playing with today is going through a pdf and using the highlight tool on words/phrases, in the hope that I can then export the comments list (which has page numbers) to some format I can work with. Doesn't work very well for the older issues with the messy fonts, since you can't always tell what the word was supposed to be (Ṣubḥ becomes ˝ubh˝S and maqāmah becomes maqa≠mah or mah≠maqa, and words with lots of diacritics become almost unrecognizable as words). Those fonts were on long-dead Macs running OS7-OS9, so aren't available to me now.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>Dorothea Salo on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1905</link>
			<pubDate>Wed, 06 Mar 2013 17:59:59 +0000</pubDate>
			<dc:creator>Dorothea Salo</dc:creator>
			<guid isPermaLink="false">1905@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;I'm confused. Are you making a concordance (list of words/phrases present in text with pointers), or an index (synthesized list of important terminology, with pointers to meaningful mentions while omitting passing ones)? They're not at all the same thing.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>olaf on "How create a real (with page numbers) index of journal&#039;s entire run, from PDFs?"</title>
						<link>http://digitalhumanities.org/answers/topic/how-create-a-real-with-page-numbers-index-of-journals-entire-run-from-pdfs#post-1904</link>
			<pubDate>Wed, 06 Mar 2013 17:57:10 +0000</pubDate>
			<dc:creator>olaf</dc:creator>
			<guid isPermaLink="false">1904@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;I need to index all back issues of &#60;em&#62;&#60;a href=&#34;http://mamluk.uchicago.edu&#34;&#62;Mamluk Studies Review&#60;/a&#62;&#60;/em&#62; (open access, now digital only but formerly print) but have not had much luck finding ideas about how to go about it.&#60;br /&#62;
Searching the Web for info about indexing PDFs leads largely to results about indexing them on a computer for improved searches, or to indexing services.&#60;br /&#62;
I hope to find software (or scripts or something!) that can &#60;/p&#62;
&#60;ul&#62;
&#60;li&#62;read PDF files&#60;/li&#62;
&#60;li&#62;understand the idea of page numbers&#60;/li&#62;
&#60;li&#62;understand that each page in a pdf is a distinct entity&#60;/li&#62;
&#60;li&#62;handle Unicode and diacritics (and, ideally, Arabic script)&#60;/li&#62;
&#60;li&#62;see phrases or hyphenated words that break across pages as single items&#60;/li&#62;
&#60;/ul&#62;
&#60;p&#62;I don't expect anything to happen automatically: I know I (or better yet an unwary grad student) will have to actually go through and mark words and phrases to be included in the index.&#60;/p&#62;
&#60;p&#62;Bonus points if it can be taught to ignore certain strings when alphabetizing. For example, since 'al-' is Arabic for 'the', it doesn't affect alphabetization (so al-Nasir Muhammad goes in the N section).&#60;br /&#62;
Similarly, there needs to be a way to instruct it that ā and a are the same for purposes of alphabetization, as are ṣ and s, etc.&#60;/p&#62;
&#60;p&#62;Super bonus points if it can recognize (or learn to recognize) variations on a word or phrase in terms of spelling (often inconsistent when transliteration is involved), word order or intervening words.&#60;/p&#62;
&#60;p&#62;What I have: 23 issues of the journal as whole-book pdfs, as well as individual pdfs of all articles. Unfortunately, the first half dozen or so were created without Unicode, using proprietary fonts with non-standard encodings. Messy, but I can work around it somehow. I also have InDesign files (various versions) for about half the issues. This will all be done in Windows (32-bit XP and 64-bit 7). I always have the latest version of Acrobat (not reader, the full program).&#60;/p&#62;
&#60;p&#62;The resulting index will be posted on the Web, probably both as a PDF and in some more dynamic and usable format(s). &#60;/p&#62;
&#60;p&#62;Any ideas for ways to streamline this would be appreciated. &#60;/p&#62;
&#60;p&#62;Thanks!&#60;br /&#62;
Olaf
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>yodj84@gmail.com on "What is the best software package for social network analysis?"</title>
						<link>http://digitalhumanities.org/answers/topic/what-is-the-best-software-package-for-social-network-analysis#post-1900</link>
			<pubDate>Tue, 05 Mar 2013 04:04:20 +0000</pubDate>
			<dc:creator>yodj84@gmail.com</dc:creator>
			<guid isPermaLink="false">1900@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @msatlow@gmail.com's &#60;a href=&#34;http://digitalhumanities.org/answers/topic/what-is-the-best-software-package-for-social-network-analysis#post-1098&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;If you are a student and need SNA software for your coursework, you can get NetMiner 4 coursework license for free for 6 months. It features statstics and visualization as well as SNA. And it's very easy to use. I think it is valuable to try. :-)&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.netminer.com&#34; rel=&#34;nofollow&#34;&#62;http://www.netminer.com&#60;/a&#62;
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>Trip Kirkpatrick on "What hardware and software would you put in a DH/Multimedia lab?"</title>
						<link>http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1688</link>
			<pubDate>Wed, 13 Jun 2012 12:14:00 +0000</pubDate>
			<dc:creator>Trip Kirkpatrick</dc:creator>
			<guid isPermaLink="false">1688@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @&#60;a href='http://digitalhumanities.org/answers/profile/triplingual'&#62;triplingual&#60;/a&#62;'s &#60;a href=&#34;http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1687&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;To me more generous to Lincoln's intent, I should add that the notion of what you can support should include maintaining the ability to be accommodating when a critical mass of users would like something installed on all the workstations.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>Trip Kirkpatrick on "What hardware and software would you put in a DH/Multimedia lab?"</title>
						<link>http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1687</link>
			<pubDate>Wed, 13 Jun 2012 11:28:15 +0000</pubDate>
			<dc:creator>Trip Kirkpatrick</dc:creator>
			<guid isPermaLink="false">1687@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @Michael Widner's &#60;a href=&#34;http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1680&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;Even though I haven't set up a DH lab, I have managed a public cluster and so wanted to throw in a few words for considering this aspect of things. From that perspective, one part of the answer to your question has to be, &#34;Nothing you can't support.&#34; By 'support' I don't mean that you have trained ninjas (or pirates or vampires, depending on where you fall in that religious war) for every application, but that you have funding, licensing, data, upgrade, and security management capabilities for everything that goes in. On that last note, I must say that when I was managing a public cluster, there's no way I would have gone with Lincoln's idea of allowing lab users to install their own software. That has huge potential to bork a system inside of 30 seconds. By the same token, providing support for users installing whatever on their own machines is a great idea. Teach a DHer to fish and all that.&#60;/p&#62;
&#60;p&#62;One management tool and approach you might consider is Deep Freeze or one of its ilk. That software resets a computer to a default stat at a desired interval. I did not use it in my cluster management work, but I know entities on campus at my institution do.&#60;br /&#62;
There are similar programs for doing centralized patch management, but not all software plays nicely with this approach. You'll need to decide whether you mind doing manual, one-by-one updates on the machines or whether everything must be tractable from an MCP.&#60;/p&#62;
&#60;p&#62;Finally, despite &#60;a href=&#34;http://twitter.com/#!/miriamkp/status/212577424330858496&#34;&#62;some opposition to smartboards&#60;/a&#62;, I'd recommend one (or similar device or software overlay), if only because some of the people you are working with will likely sooner or later need to know intimately their features, benefits, and advantages.&#60;/p&#62;
&#60;p&#62;[EDITED to make smartboard ref a proper link, and to add paragraph spacing]
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>lmullen on "What hardware and software would you put in a DH/Multimedia lab?"</title>
						<link>http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1686</link>
			<pubDate>Wed, 13 Jun 2012 02:13:46 +0000</pubDate>
			<dc:creator>lmullen</dc:creator>
			<guid isPermaLink="false">1686@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;Ideally the people using the computers would be also be able to install their own software to fit their specific purposes.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>Amanda Visconti on "What hardware and software would you put in a DH/Multimedia lab?"</title>
						<link>http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1685</link>
			<pubDate>Wed, 13 Jun 2012 01:06:44 +0000</pubDate>
			<dc:creator>Amanda Visconti</dc:creator>
			<guid isPermaLink="false">1685@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @Michael Widner's &#60;a href=&#34;http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1683&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;I'm not a Final Cut expert, but I've been using it to do pretty basic things (splicing, adjusting audio, adding text overlays) for lecture and demo videos without a hitch. Working with a simple consumer-aimed thing like iMovie before moving on to Final Cut certainly helped me understand the more advanced software better, so perhaps that's a better (and cheaper?) initial tool for graduate students.&#60;/p&#62;
&#60;p&#62;Some other things you might consider adding:&#60;br /&#62;
1. A friendly FTP GUI (Fetch or Filezilla).&#60;br /&#62;
2. This isn't exactly software, but you might provide encouragement to get familiar with the command line by, for example, keeping the Terminal icon in the Mac dock, posting simple commands students can emulate (e.g. you just transferred that file in Fetch, here's how to do the same thing way quicker via the command line; or better yet, here's how to customize your command prompt with custom greetings and colors as a way to feel like it's your own tool).&#60;br /&#62;
3. A good text editor (anything that does some color coding and handles closing tags; I love TextWrangler and also use Oxygen for TEI work).&#60;br /&#62;
4. A GitHub GUI (GitHub for Mac is great, and a Windows version came out recently) and a GitHub account. Not only a valuable tool, but a good way to keep earlier versions of work safe when people are trying new things.&#60;/p&#62;
&#60;p&#62;Would be interested to hear back as your space develops. Have fun!
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>Michael Widner on "What hardware and software would you put in a DH/Multimedia lab?"</title>
						<link>http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1684</link>
			<pubDate>Tue, 12 Jun 2012 16:19:46 +0000</pubDate>
			<dc:creator>Michael Widner</dc:creator>
			<guid isPermaLink="false">1684@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @&#60;a href='http://digitalhumanities.org/answers/profile/jenniferserventi'&#62;JenniferServenti&#60;/a&#62;'s &#60;a href=&#34;http://digitalhumanities.org/answers/topic/what-hardware-and-software-would-you-put-in-a-dhmultimedia-lab#post-1682&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;Thanks Jennifer. Most of that thread is about arranging tables and whiteboards and such: very useful, but not applicable to my case, unfortunately.
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
