Technical Blog

Create a new Wikipedia Miner’s application with its Java API

We have seen in a previous article that Wikipedia Miner is a great framework to build algorithms on top of Wikipedia. It eases the use of requests via webservices using Tomcat.

However, if you don’t want to use Tomcat, we’re going to see how you can create a standalone program that uses all the power of the framework.

We assume in the article that you already have a working Wikiminer setup (databases + configuration files).

Create a basic class

Our example will be very simple, so we don’t focus on implementing new functionalities, but rather focus on creating the program and compiling it.

Here, we will do a program that reads page’s id from command line and display their associated title.

The different steps are the following:

  • Create a WikipediaConfiguration from wikipedia.xml.
  • Create a Wikipedia object that loads the DBs from the WikipediaConfiguration.
  • Cann Wikipedia methods to compute the algorithms.
package fr.gauth.wikiminer;

import java.io.File;
import java.util.Scanner;

import org.wikipedia.miner.model.Page;
import org.wikipedia.miner.model.Wikipedia;
import org.wikipedia.miner.util.WikipediaConfiguration;

public class IdToTitlePrompt
{
	protected static WikipediaConfiguration getConfiguration(String args[])
	{
		if (args.length != 1)
		{
			System.out.println("Please specify path to wikipedia configuration file");
			System.exit(1);
		}

		File confFile = new File(args[0]);
		if (!confFile.canRead())
		{
			System.out.println("'" + args[0] + "' cannot be read");
			System.exit(1);
		}

		WikipediaConfiguration conf = null;
		try
		{
			conf = new WikipediaConfiguration(confFile);

			if (conf.getDataDirectory() == null
					|| !conf.getDataDirectory().isDirectory())
			{
				System.out.println("'" + args[0]
						+ "' does not specify a valid data directory");
				System.exit(1);
			}

		} catch (Exception e)
		{
			e.printStackTrace();
			System.exit(2);
		}
		return conf;
	}

	public static void main(String args[]) throws Exception
	{
		WikipediaConfiguration conf = getConfiguration(args);

        // if 2nd argument is set to true, the preparation of DB is threaded
        // which allows to run the code immediatly, rather than waiting
        // for the DB to be cached.
		Wikipedia wikipedia = new Wikipedia(conf, true);

		Scanner sc = new Scanner(System.in);
		while (sc.hasNextInt())
		{
			int id = sc.nextInt();
			Page page = wikipedia.getPageById(id);
			System.out.println(page.getTitle());
		}
        wikipedia.close();
	}
}

Compile and execute it

To compile and run it, we will update the original Ant’s build.xml, so it can create a standalone executable jar. To do so, we follow these steps:

  • Add a new custom target, that I called assembly (copy of package)
  • Join the dependencies (/lib) to the jar
  • Set the main class, so the jar is self-runnable

We obtain the following entry:

    <target name="assembly" depends="build" description="creates an executable jar with its dependencies">
		<echo>Creating the runnable standalone jar file</echo>
    	<mkdir dir="${build.dir}/jar"/>

    	<jar destfile="${build.dir}/jar/${jar.mainModule}" >
    		<fileset dir="${build.dir}/classes"/>
			<zipgroupfileset dir="lib" includes="*.jar"/>
			<manifest>
		  		<attribute name="Main-Class" value="fr.gauth.wikiminer.IdToTitlePrompt"/>
			</manifest>
    	</jar>
    </target>

Finally, we can run it using the following command:

ant assembly && java -jar  ../build/jar/wikipedia-miner.jar  ../configs/en.xml

In the shell, we can type 9232 and it will successfully display “Eiffel Tower”.

  • chengyao Chen

    hi, thanks for sharing. I am using the tool right now.Because the wikiminer website is down, I found trouble in setting up Wikiminer. how to set it up ? I am new to wikipedia miner. Please share.

  • Kirill

    Where the “../configs/en.xml” file is located? Can’t find that, and the project in NetBeans throws exception when trying to find that file.